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METHODS FOR PREDICTING FUNCTIONAL MSD STRUCTURAL 
PROPERTIES OF POLYPEPTIDES USING SEQUENCE MODELS 



This application claims benefit of provisional 

application serial no. 60/ , filed 

5 December 29, 2000, which was converted from U.S. Serial 
No. 09/753,020, filed December 29, 2000, and which is 
incorporated herein by reference. 



BACKGROUND OF THE INVENTION 

The present invention relates generally to 

10 interactions between ligands and polypeptides and more 
specifically to determining structure-related properties 
of a ligand when bound to different polypeptides. 

Structure determination plays a central role in 
chemistry and biology due to the correlation between the 

15 structure of a molecule and its function. Although a 
full understanding of this correlation is not yet 
established, one can gain insight into the function of a 
molecule from its deduced structure. Thus, the structure 
can provide a strong basis for formulating experiments to 

20 determine function. Conversely, the eventual disclosure 
of a structure for a well studied molecule can have a 
significant effect in converging apparently disparate 
observations of function into a consistent description of 
the molecule's activity. 



25 Practical applications which are becoming 

increasingly dependent upon structure information 
include, for example, the production of therapeutic 
drugs. Therapeutic drugs can be designed by synthesizing 
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a molecule that mimics a ligand known to interact with a 
target receptor. Alternatively, a therapeutic drug can 
be designed by computer assisted methods in which a 
molecule is designed to dock to a binding site on a 
5 receptor of known structure. By structure-based methods 
such as these, lead compounds can be identified for 
further development. 

Using a similar structure based approach a 
receptor can be engineered to yield improved or novel 
functions. For example, changes can be made at a ligand 
binding site in a polypeptide receptor based on the known 

structure of the receptor. Given that a polypeptide 
receptor can contain hundreds or even thousands of amino 
acid residues, of which only a few may contact a ligand, 
structural information is useful in identifying where 
changes should be made in the polypeptide to alter ligand 
binding. Polypeptide receptors engineered as such can be 
used for a variety of practical applications including, 
for example, industrial catalysis, therapeutics, and 
bioremediation . 

Although methods for structure determination 
are evolving, it is currently difficult, costly and time 
consuming to determine the structure of a polypeptide or 
ligand. It can often be even more difficult to produce a 
25 polypeptide-ligand complex in a condition allowing 
determination of a structure for the bound complex. 
Resorting to determining a structure for the receptor 
individually can have limited value, particularly if the 
location of ligand binding is difficult to identify due 
30 to the large size of most polypeptide receptors. 

Similarly, determination of a structure of an unbound 
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ligand can have limited usefulness because an unbound 
ligand has multiple conformations and the most stable 
conformation of an unbound ligand is often different from 
its conformation when bound to a receptor. 

5 Theoretical modeling of ligand-polypeptide 

interactions is one alternative that has been attempted 
in cases where the structure of the polypeptide-ligand 
complex is not available. In this approach a ligand is 
fitted to a structure of a polypeptide. The polypeptide 

10 structure used can be determined empirically or 
theoretically. Theoretical determination of a 
hypothetical molecular structure for a polypeptide by ah 
initio methods is a relatively undeveloped method. 
Another theoretical approach, referred to as homology 

15 modeling, has been used to infer structure based on 
comparison with molecules of known structure. 

The successful application of homology modeling 
to determining polypeptide-ligand interactions relies 
upon choosing a correct polypeptide template for 

20 comparison. In most cases criteria for comparison are 
unavailable or unreliable- For example, it is common to 
produce a hypothetical structure of a target polypeptide 
based on the empirically determined structure of a 
template polypeptide having similar sequence. However, 

25 similarities in sequence do not always yield similar 

structures and conversely, similar structures have been 
observed for two polypeptides having significantly 
diverged sequences. 

Thus, there exists a need for efficient methods 
30 to identify properties of a ligand that confer binding 
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specificity for polypeptide receptors. A need also 
exists for methods to classify polypeptides and ligands 
according to structural characteristics. The present 
invention satisfies this need and provides related 
5 advantages as well. 

SUMMARY OF THE INVENTION 

The invention provides a method for identifying 
a polypeptide that binds a ligand. The method includes 
the steps of (a) comparing a sequence of a polypeptide to 
a sequence model for polypeptides that bind a ligand, 
wherein the sequence model comprises representations of 
amino acids consisting of a subset of amino acids, the 
subset of amino acids having one or more atom within a 
selected distance from a bound ligand in the polypeptides 
that bind the ligand; and (b) determining a relationship 
between the sequence and the sequence model, wherein a 
correspondence between the sequence and the sequence 
model identifies the polypeptide as a polypeptide that 
binds the ligand. 

20 The invention also provides a method for 

identifying a member of a pharmacofamily . The method 
includes the steps of (a) comparing a sequence of a 
polypeptide to a sequence model for polypeptides of a 
pharmacofamily; and (b) determining a relationship 

25 between the sequence and the sequence model, wherein a 
correspondence between the sequence and the sequence 
model identifies the polypeptide as a member of the 
pharmacofamily . 
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The invention also provides a method for 
identifying a member of a pharmacof amily, wherein the 
method includes the steps of (a) comparing a sequence of 
a polypeptide to a sequence model and a differential 
5 sequence model; and (b) determining a relationship 

between the sequence and the sequence models, wherein a 
correspondence between the sequence and the sequence 
models identifies the polypeptide as a member of the 
pharmacof amily. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 shows pharmacoclusters identified from 
a database of 156 bound structures of nicotinamide 
adenine dinucleotide or nicotinamide adenine dinucleotide 
phosphate. Structures were generated using the overlay 
function in INSIGHT98 (Molecular Simulations Inc., San 
Diego, CA) . 

Figure 2 shows the nomenclature used herein for 
atom names in the NAD(P) molecule. 

Figure 3 shows conformer models with 
20 interacting atoms from bound polypeptide and ordered 
waters overlayed. Models in parts A through H were 
derived from pharmacoclusters 1-8, respectively as 
described in the Examples. Overlayed atoms and waters 
are identified as either hydrogen bond donors (donors) , 
25 hydrogen bond acceptors (acceptors) , sulfurs (sulfurs) , 
waters (waters) , or atoms that can be hydrogen bond 
acceptors or hydrogen bond donors (acceptors/donors) 
according to the legend in part A. 




6 

Figure 4 shows a portion of a 2D [^H,^H] NOESY 
spectrum recorded with a 0.2 ml sample of 1 mM NADP and 
200 |LiM of enzyme 1-deoxy D-xylulose 5-phosphate 
reductoisomerase (DOXP) . Atoms are identified according 
5 to Figure 2. Spectra are reported as parts per million 
(ppm) . Since ligand is in fast exchange and in excess 
over polypeptide, cross peaks represent transferred NOEs. 

Figure 5 shows high affinity binding of 
compound TTEOOOl . 001 . A07 to polypeptide enzymes of 
pharmacof amily 1 (panel A) and pharmacof amily 8 (panel 
B) • Double reciprocal plots of reaction rate versus 
concentration of NADH (panel A) or NADPH (panel B) are 
shown for each enzyme in the presence of various 
concentrations of compound TTEOOOl . 001 . A07 . 
Concentrations of compound TTEOOOl . 001 .A07 shown to the 
right of the plot A correspond 7.1 |iM (open triangles), 
3.6 |iM (closed triangles), 1-8 |liM (open circles) and no 
added compound (closed circles) . Concentrations of 
compound TTEOOOl . 001 .A07 shown to the right of the plot B 

correspond 56.2 [iM (open triangles), 37.5 jaM (closed 
triangles), 18.7 ]liM (open circles) and no added compound 
(closed circles) . Inhibitory dissociation constants (K^^) 
determined from the data are shown in the upper left 
corner of the respective plot. 

25 Figure 6 shows high affinity binding of 

compound TTEOOOl • 002 . D02 to a polypeptide enzyme of 
pharmacof amily !• A double reciprocal plot of reaction 
rate versus concentration of NADH is shown for the enzyme 

in the presence of various concentrations of compound 
30 TTEOOOl. 002. D02. Concentrations of compound 
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TTE0001.002,D02 shown to the right of the plot A 
correspond 20.6 [iM (open triangles), 13-7 jiM (closed 

triangles), 6.9 ^iM (open circles) and no added compound 
(closed circles) . An inhibitory dissociation constant 
5 (K^g) determined from the data is shown in the upper left 
corner of the plot. 

Figure 7 shows a pharmacophore model derived 
from the coordinates presented in Table 3 for 
pharmacof amily 1. Figure 7A shows a feature of the 

10 pharmacophore model including a volume defining the shape 
of conformer model 1 which is indicated by grey spheres 
and superimposed on the conformer model having 
coordinates listed in Table 3C. Figure 7B shows three 
features of the pharmacophore model including a 

15 hydrophobic region of the nicotinamide ring, a hydrogen 
bond acceptor positioned at the averaged coordinates for 
the location of 17 hydrogen bond acceptors in the 
polypeptides of pharmacof amily 1, and a hydrogen bond 
donor positioned where a hydrogen bond donor of a ligand 

20 would be expected to have favorable interactions with 
hydrogen bond acceptors observed in 11 out of 17 of the 
polypeptides in pharmacof amily 1. Figure 7C shows a 
combination of features of figures 7A and 7B present in a 
pharmacophore model and superimposed on the conformer 

25 model. 

Figure 8 shows a plot of -ln(E) vs. L for the 
results of searching the PDB with a Hidden Markov Model 
trained with sequences from pharmacof amily 5. E is the 
Expectation value and L is the location of identified 
30 sequences in a list ranked by E value. Identified 

sequences and respective E values are listed in Table 12. 
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True positives are plotted as diamonds and false 
positives are plotted as circles. 

Figure 9 shows a plot of "ln(E) vs. L for the 
results of searching the PDB with a Hidden Markov Model 
trained with a first set of sequences from pharmacof amily 
3. E is the Expectation value and L is the location of 
identified sequences in a list ranked by E value. 
Identified sequences and respective E values are listed 
in Table 13. True positives are plotted as diamonds and 
false positives are plotted as circles. 

Figure 10 shows a plot of -ln(E) vs. L for the 
results of searching the PDB with a Hidden Markov Model 
trained with a second set of sequences from 
pharmacof amily 3. E is the Expectation value and L is 
the location of identified sequences in a list ranked by 
E value. True positives are plotted as diamonds and 
false positives are plotted as circles. 

Figure 11 shows a sequence alignment made from 
a structural overlay of pharmacof amily 1. Amino acids 
20 shown correspond to those which are within regions that 
overlap in the structural overlay. All bolded letters 
are within 4.5 Angstroms from a ligand binding site. 
Underlining indicates proximity to a cofactor ligand 
and/or substrate ligand as follows: bold underling 
25 indicates proximity to a bound cofactor, double underling 
indicates proximity to a bound substrate, and dotted 
underling indicates proximity to both bound cofactor and 
bound substrate. 
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Figure 12 shows a plot of -ln(E) vs. L for the 
results of searching the PDB with a Hidden Markov Model 
trained with sequences from pharmaco family 1. E is the 
Expectation value and L is the location of identified 
5 sequences in a list ranked by E value. Identified 

sequences and respective E values are listed in Table 15. 
True positives are plotted as diamonds and false 
positives are plotted as circles. 



Figure 13 shows a plot of -ln(E) vs. L for the 
10 results of a differential search of the PDB with a first 
Hidden Markov Model trained with sequences from 
pharmacof amily 1 and a second Hidden Markov Model trained 
with sequences including residues proximal to a bound 
ligand in polypeptides of pharmacof amily 1. E is the 
15 Expectation value and L is the location of identified 
sequences in a list ranked by E value. Identified 
sequences and respective E values are listed in Table 16. 
True positives are plotted as diamonds and false 
positives are plotted as circles. 

20 Figure 14 shows the data of Figure 12 overlayed 

with XCorr values calculated for each sequence. XCorr 
values are plotted as triangles, true positives are 
plotted as squares and false positives are plotted as 
circles . 

25 

DETAILED DESCRIPTION OF THE INVENTION 



The invention provides pharmacoclusters and 
methods for identifying a pharmacocluster from bound 
conformations of a ligand bound to different 
30 polypeptides. The methods are applicable for identifying 
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a conformation-dependent property of a ligand based on 
bound conformations of the ligand in a pharmacocluster . 
The methods are also applicable for classifying 
polypeptides, from a family of polypeptides that bind the 
5 same ligand, into pharmacof amilies based on bound 

conformations of the ligand. Accordingly, methods are 
provided for grouping polypeptides into pharmacof amilies 
by determining bound conformations of a ligand or a 
conformation-dependent property of a ligand independent 

10 of a determination of the structure of the polypeptide. 
An advantage of classifying polypeptides according to 
bound conformations of a ligand is that a pharmacof amily 
is likely to contain polypeptides having greater binding 
specificity for a particular molecule than other 

15 polypeptides in the same family. Thus, the methods allow 
identification of a pharmacof amily that can specifically 
interact with a particular therapeutic agent or drug. 

Additionally, the methods of the invention can 
be used to determine a conformer model or pharmacophore 

20 model based on a bound conformation or conformation- 
dependent property of a ligand bound to polypeptides in a 
pharmacof amily. The invention is therefore advantageous 
in providing a model for the design and identification of 
therapeutic compounds having specificity for a 

25 pharmacof amily of polypeptides. 

Further, the methods of the invention can be 
used to identify structural properties and ligand binding 
properties of polypeptides based on comparison of their 
sequences to polypeptides in one or more 
30 pharmacof amilies . An advantage of the invention is that 
ligand binding properties can be identified for 
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polypeptides in a database for which sequence information 
is readily available but structural and/or functional 
properties are incompletely known or unavailable. 

Another advantage of the invention is that the 
5 methods provide a correlation between ligand 

conformation, a parameter that is relatively easy to 
measure, and polypeptide structure, a parameter of 
tremendous value but often difficult to measure. 
Therefore, the methods of the invention can be used to 
10 determine structural characteristics of a polypeptide 
based on a conformation-dependent property of a bound 
ligand. 

As used herein^ the term ^'pharmacocluster" 
refers to a collection of substantially the same bound 

15 conformations of a ligand, or portion thereof, bound to 
two or more polypeptides. A member conformation of a 
pharmacocluster can have (1) a conformation that is more 
similar to an average conformation of the members in its 
pharmacocluster than to any other pharmacocluster and (2) 

20 a conformation that is more similar to an average 

conformation of the members in its own pharmacocluster 
than the most similar average structures from different 
pharmacoclusters are to each other, wherein the 
pharmacoclusters consist of conformations of the same 

25 ligand or portion thereof. The pharmacocluster is 

determined for a ligand bound to different polypeptides 
but does not require that a structure of the polypeptide 
be known or included as part of a bound conformation of a 
ligand. A bound conformation of a ligand can include the 

30 entire ligand structure or selected atoms including a 

portion of the complete atomic composition of the ligand 
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so long as the number of atoms provides sufficient 
information to distinguish one pharmacocluster from 
another* A pharmacocluster can include both the bound 
conformations of a ligand, or portion thereof, and one or 
5 more atoms that both interact with the ligand and are 
from a bound polypeptide. Thus, a pharmacocluster can 
include conformational information of 1 or more, 2 or 
more, 5 or more, 10 or more, 20 or more, 30 or more, 40 
or more, 50 or more or 100 or more atoms of a ligand 
10 bound conformation. 

^ Accordingly, portions of bound conformations of 

y two or more different ligands can be included in a ligand 

iy pharmacocluster so long as the portions selected from 

each ligand have a core bound conformation that is 
H 15 substantially the same. A core bound conformation can 
^ consist of portions of bound conformations of ligands 

[fl wherein the portions have identical structural formula 

^ and conformation. A core bound conformation can also 

consist of portions of bound conformations of ligands 
20 wherein the portions have different structural formulas 
so long as the portions have substantially the same 
conformation. The structural formula, as it is 
understood in the art, is a 2 dimensional representation 
of a molecule that identifies the atoms and covalent 
25 bonds between each atom in the molecule. The structural 
formula does not necessarily include information 
sufficient to determine conformation of a molecule. For 
example, a common structural formula representation of 
cyclohexane can be a hexagon with 2 hydrogens attached to 
30 each carbon being in equivalent positions. However, a 

stable conformation of cyclohexane in solution may appear 
as a '^chair" or ^'boaf shape with hydrogens in either 
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axial or equitorial positions relative to the molecular 
plane • 

As used herein, the term ^'conformation- 
dependent property, " when used in reference to a ligand, 
refers to a characteristic of a ligand that specifically 
correlates with the three dimensional structure of a 
ligand or the orientation in space of selected atoms and 
bonds of the ligand. Thus, a ligand bound to a 
polypeptide in a distinct conformation will have at least 
one unique conformation-dependent property correlated 
with the bound conformation of the ligand. A 
conformation-dependent property can be derived from or 
include the entire ligand structure or selected atoms and 
bonds, including a fragment or portion of the complete 
atomic composition of the ligand. A conformation- 
dependent property that includes selected atoms and bonds 
of a ligand can include 2 or more, 3 or more, 5 or more, 
10 or more, 15 or more, 20 or more, 25 or more, or 50 or 
more atoms of a bound conformation of a ligand. 

20 A characteristic that specifically correlates 

with a three dimensional structure of a ligand is a 
characteristic that is substantially different between at 
least two different bound conformations of the same 
ligand and, therefore, distinguishes the two different 

25 bound conformations. A conformation-dependent property 
can include a physical or chemical characteristic of a 
ligand, for example, absorption and emission of heat, 
absorption and emission of electromagnetic radiation, 
rotation of polarized light, magnetic moment, spin state 

30 of electrons, or polarity. A conformation-dependent 

property can also include a structural characteristic of 
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a ligand based, for example, on an X-ray diffraction 
pattern or a nuclear magnetic resonance (NMR) spectrum. 
A conformation-dependent property can additionally 
include a characteristic based on a structural model, for 
5 example, an electron density map, atomic coordinates, or 
x-ray structure. A conformation-dependent property can 
include a characteristic spectroscopic signal based on, 
for example, Raman, circular dichroism (CD) , optical 
rotation, electron paramagnetic resonance (EPR) , infrared 
10 (IR) , ultraviolet/visible absorbance (UV/Vis) , 

La 

Q fluorescence, or luminescence spectroscopies. A 

y conformation-dependent property can also include a 

O characteristic NMR signal, for example^ chemical shift, J 

-J coupling, dipolar coupling, cross-correlation, nuclear 

LI] 15 spin relaxation, transferred nuclear Overhauser effect, 

or combinations thereof. A conformation-dependent 
mJ property can additionally include a thermodynamic or 

kinetic characteristic based on, for example, 
□ calorimetric measurement or binding affinity measurement. 

20 Furthermore, a conformation-dependent property can 

include characteristic based on electrical measurement, 

for example, voltammetry or conductance. 

As used herein, ^^selected" conformation- 
dependent properties are identified to form a set of 

25 conformation-dependent properties that can include, for 
example, the entire set of conformation-dependent 
properties associated with the bound conformations of a 
ligand in a pharmacocluster or a subset of conformation- 
dependent properties associated with the bound 

30 conformations of a ligand in a pharmacocluster, so long 
as the subset of conformation-dependent properties are 
sufficient to identify a unique conformation of the 
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ligand. A selected conformation-dependent property can 
include any of the above described properties, for 
example, a physical or chemical property, structural 
data, a structural model, a spectroscopic signal, a 
5 thermodynamic or kinetic measurement or an electrical 
measurement - 

As used herein, the term ^^bound conformation," 
when used in reference to a ligand, refers to the 
location of atoms of a ligand relative to each other in 

10 three dimensional space, where the ligand is bound to a 
polypeptide. The location of atoms in a ligand can be 
described, for example^ according to bond angles, bond 
distances, relative locations of electron density, 
probable occupancy of atoms at points in space relative 

15 to each other, probable occupancy of electrons at points 
in space relative to each other or combinations thereof. 

As used herein, a '^selected" bound conformation 
refers to a set of bound conformations that can include, 
for example, the entire set of defined bound 
20 conformations or a subset of bound conformations of a 
ligand. 

As used herein, the term ^^clustering" refers to 
assigning related bound conformations of a ligand, or 
portion thereof, into a first collection such that the 

25 conformations residing in the first collection can be 

overlaid with substantial overlap and bound conformations 
from two different collections cannot be overlaid with a 
better overlap than that resulting from members of the 
first collection. Exemplary clustering of ligand 

30 conformations are disclosed herein (see Example I) . 
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As used herein, the term "ligand" refers to a 
molecule that can specifically bind to a polypeptide. 
Specific binding, as it is used herein, refers to binding 
that is detectable over non-specific interactions by 
5 quantifiable assays well known in the art. A ligand can 
be essentially any type of natural or synthetic molecule 
including, for example, a polypeptide, nucleic acid, 
carbohydrate, lipid, amino acid, nucleotide or any 
organic derived compound- The term also encompasses a 
10 cofactor or a substrate of a polypeptide having enzymatic 
activity, or substrate that is inert to catalytic 
=p; conversion by the bound polypeptide. Specific binding to 

^ a polypeptide can be due to covalent or non covalent 

interactions. 



15 As used herein, the term ^^bound to two or more 

polypeptides, when used in reference to a ligand is 
intended to refer to two or more complexes consisting of 
a ligand and a polypeptide. A complex can include, for 
example, a single ligand bound to a single polypeptide. 

20 A complex can also include a single ligand bound to more 
than one polypeptides including, for example, a complex 
in which a ligand is bound at the interface of 
interacting polypeptides- A complex can also include 
multiple ligands, however, conformation dependent 

25 properties of all ligands of the complex need not be 
identified. A complex results from a specific 
interaction between a polypeptide and a ligand. 

As used herein, the term ^^substantially the 
same," when used in reference to bound conformations of a 
30 ligand, or portion thereof, is intended to refer to two 
or more bound conformations that can be overlaid upon 
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each other in 3 dimensional space such that all 
corresponding atoms between the two conformations are 
overlapped. Accordingly, "substantially different" bound 
conformations cannot be overlaid upon each other in 3- 
5 dimensional space such that all corresponding atoms 
between the two bound conformations are overlapped. 

As used herein, the term "polypeptide" is 
intended to refer to a peptide polymer of two or more 
amino acids. The term is similarly intended to include 

10 polymers containing amino acid sterioisomers, analogues 
and functional mimetics thereof. For example, 
derivatives can include chemical modifications of amino 
acids such as alkylation, acylation, carbamylation, 
iodination, or any modification which derivatizes the 

15 polypeptide. Analogues can include modified amino acids, 
for example, hydroxyproline or carboxyglutamate, and can 
include amino acids, or analogs thereof, that are not 
linked by peptide bonds. Mimetics encompass chemicals 
containing chemical moieties that mimic the function of 

20 the polypeptide regardless of the predicted three- 
dimensional structure of the compound. For example, if a 
polypeptide contains two charged chemical moieties in a 
functional domain, a mimetic places two charged chemical 
moieties in a spatial orientation and constrained 

25 structure so that the corresponding charge is maintained 
in three-dimensional space. Thus, all of these 
modifications are included within the term "polypeptide" 
so long as the polypeptide retains its binding function. 

As used herein, the term "root mean square 
30 deviation," or RMSD, refers to a standard deviation which 
quantifies the structural variability in a population of 



I in 
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bound conformations of a ligand. The term is intended to 
be consistent with its meaning as understood in the art 
as described for example in Doucet and Weber, Computer- 
Aided Molecular Design: Theory and Applications , Academic 
5 Press, San Diego CA (1996) • 

As used herein, the term ^^family," when used in 
reference to characterizing polypeptides having ligand 
binding activity, is intended to refer to polypeptides 
that can bind to the same ligand, or portion thereof. A 

10 polypeptide family can contain polypeptides having 
binding activity for a common ligand with sufficient 
affinity, avidity or specificity to allow measurement of 
the binding event. As defined herein a ^'member'' of a 
polypeptide family refers to an individual polypeptide 

15 that can be classified in a polypeptide family because 
the polypeptide binds a ligand, or portion thereof, that 
binds another polypeptide in a polypeptide family. The 
bound conformations of a ligand bound by individual 
members of a family can be substantially the same or 

20 different from each other. 

As used herein, the term ^^pharmacof amily, " when 
used in reference to polypeptides, is intended to refer 
to polypeptides that can be classified together in a 
population because they individually bind a ligand such 

25 that the ligand is bound in substantially the same 
conformation. As defined herein a ^"member" of a 
polypeptide pharmacof amily refers to an individual 
polypeptide that is classified in a polypeptide 
pharmacof amily because the polypeptide binds a 

30 conformation of a ligand that is substantially the same 
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as a conformation of the ligand bound to another 
polypeptide in the pharmacof amily . 

As used herein, the term "grouping" refers to 
assigning related polypeptides into a family or 
5 pharmacofamily such that the polypeptide members of a 
family bind the same ligand and the polypeptide members 
of a pharmacofamily bind substantially the same bound 
conformation of a ligand. 

As used herein, the term "fold," when used in 
10 reference to a polypeptide, refers to a specific 

geometric arrangement and connectivity of a combination 
of secondary structure elements in a polypeptide 
structure. Secondary structure elements of a polypeptide 
that can be arranged into a fold including, for example, 
15 alpha helices, beta sheets, turns and loops are well 
known in the art. Folds of a polypeptide can be 
recognized by one skilled in the art and are described 
in, for example, Branden and Tooze, Introduction to 
nrotein structure . Garland Publishing, New York (1991) 
20 and Richardson, Adv. Prot . Chem. 34:167-339 (1981). 

As used herein, "modeling the three dimensional 
structure" when used in reference to a polypeptide refers 
to determining a conformation for a polypeptide. A 
conformation of a polypeptide can be determined, for 

25 example, from empirical data specifying structure or from 
a compared conformation used as a template. A 
conformation can be determined at any desired level of 
resolution sufficient to identify, for example, overall 
shape of a polypeptide, tertiary structure elements, 

30 secondary structure elements, polypeptide backbone 
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structure^ amino acid residue identity or location of 
individual atoms. 



As used herein, the term ^'structural model,'' 
when used in reference to a polypeptide, refers to a 
5 representation of a 3 dimensional structure of a 

polypeptide. A structural model can be determined from 
empirical data derived from, for example. X-ray 
crystallography or nuclear magnetic resonance 
spectroscopy. A structural model can also be derived 

10 from a theoretical calculation including, for example, 
comparison to a known structure or ab initio molecular 
modeling. A representation of a structural model can 
include, for example^ an electron density map, atomic 
coordinates, x-ray structure model, ball and stick model, 

15 density map, space filling model, surface map, Connolly 
surface. Van der Waals surface or CPK model. 

As used herein, the term ^'conformer model'' 
refers to a representation of points in a defined 
coordinate system wherein a point corresponds to a 

20 position of an atom in a bound conformation of a ligand. 
The coordinate system is preferably in 3 dimensions, 
however, manipulation or computation of a model can be 
performed in 2 dimensions or even 4 or more dimensions in 
cases where such methods are preferred. A point in the 

25 representation of points can, for example, correlate with 
the center of an atom. Additionally, a point in the 
representation of points can be incorporated into a line, 
plane or sphere to include a shape of one or more atom or 
volume occupied by one or more atom. A conformer model 

30 can be derived from 2 or more bound conformations of a 
ligand. For example a conformer model can be generated 
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from 3 or more, 4 or more, 5 or more, 6 or more, 7 or 
more, 8 or more, 10 or more, 15 or more, 20 or more or 25 
or more bound conformations of a ligand. 



As used herein, the term ^^average structure," 
5 when used in reference to bound conformations of a ligand 
in a pharmacocluster, refers to conformer model, derived 
by superimposing the bound conformations of a ligand in a 
pharmacocluster, and determining an average location in 
space for corresponding atoms. 

Q 10 As used herein, the term ^^pharmacophore model" 

T refers to a representation of points in a defined 

I—.: 

ffi coordinate system wherein a point corresponds to a 

position or other characteristic of an atom or chemical 
s moiety in a bound conformation of a ligand and/or an 

k| 15 interacting polypeptide or ordered water. An ordered 
^ water is an observable water in a model derived from 

n structural determination of a polypeptide- A 

^ pharmacophore model can include, for example, atoms of a 

bound conformation of a ligand, or portion thereof. A 
20 pharmacophore model can include both the bound 

conformations of a ligand, or portion thereof, and one or 
more atoms that both interact with the ligand and are 
from a bound polypeptide. Thus, in addition to geometric 
characteristics of a bound conformation of a ligand, a 
25 pharmacophore model can indicate other characteristics 
including, for example, charge or hydrophobicity of an 
atom or chemical moiety. A pharmacaphore model can 
incorporate internal interactions within the bound 
conformation of a ligand or interactions between a bound 
30 conformation of a ligand and a polypeptide or other 
receptor including, for example, van der Waals 



n ti 
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interactions, hydrogen bonds, ionic bonds, and 
hydrophobic interactions . A pharmacophore model can be 
derived from 2 or more bound conformations of a ligand. 
For example a conformer model can be generated from 3 or 
5 more, 4 or more, 5 or more, 6 or more, 7 or more, 8 or 
more, 10 or more, 15 or more, 20 or more or 25 or more 
bound conformations of a ligand. 

A point in a pharmacophore model can, for 
example, correlate with the center of an atom or moiety . 
Additionally, a point in the representation of points can 
be incorporated into a line, plane or sphere to indicate 
a characteristic other than a center of an atom or moiety 
including, for example, shape of an atom or moiety or 
volume occupied by an atom or moiety. The coordinate 
system of a pharmacophore model is preferably in 3 
dimensions, however, manipulation or computation of a 
model can be performed in 2 dimensions or even 4 or more 
dimensions in cases where such methods are preferred. 
Multidimensional coordinate systems in which a 
pharmacophore model can be represented include, for 
example, cartesian coordinate systems, fractional 
coordinate systems, or reciprocal space. The term 
pharmacophore model is intended to encompass a conformer 
model . 

25 As used herein, the term ^'moiety" refers to a 

group of atoms that form a part or portion of a larger 
molecule. A moiety can consist of any number of atoms in 
a portion of a ligand and can correlate with a physical 
or chemical property conferred upon the ligand by the 

30 combined atoms. Exemplary moieties of a nicotinamide 
adenine dinucleotide ligand include a phosphate. 
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nicotinamide ring^ amino group, amide group or ribose 
ring. In addition, a nicotinamide adenine dinucleotide 
group can be a moiety. For example, a nicotinamide 
adenine dinucleotide can be a moiety of the 2'P phosphate 
5 in a nicotinamide adenine dinucleotide phosphate molecule 
(see Figure 2 for location of the 2'P phosphate in 
nicotinamide adenine dinucleotide phosphate) . 

As used herein the term '^sequence model" refers 
u to a mathematical representation of the frequency and 

y 10 order with which specific monomeric units or gaps occur 
£ in a set of polymers. The mathematical representation 

JT;: can include a probability of a given monomer occurring at 

ifl a position in the sequence model. A probability of a 

7' given monomer occurring at a position in the sequence 

H 15 model can be independent of other positions or can depend 
l{\ on the occupancy at any or all other positions in the 

ffl sequence model. An example of a position independent 

1^ sequence model is a Hidden Markov Model as described 

below. An example of a position dependent sequence model 
20 is a sequence model with positions 1 through 10, where 
the occupancy at each position is modeled 
probabilistically. In a sequence model such as this, the 
probability that a specific monomer occurs at position 1 
can vary based on the identity of the monomers that 
25 occupy other positions such as 2, 8, and/or 9. A polymer 
included in the term can be, for example, a polypeptide 
or nucleotide- A sequence of a polypeptide that is 
useful in the methods of the invention can be represented 
by amino acids or nucleotides encoding amino acids of the 
30 polypeptide such as codons. A sequence of a polypeptide 
that is useful in the methods of the invention includes a 
full sequence, or a portion thereof, including, for 
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example, a domain, region or residues separated by gaps 
in the full sequence. 

As used herein the term ^Mif f erential/' when 
used in reference to sequence models, refers to a 
relationship between sequence models where a first 
sequence model represents a frequency with which specific 
monomeric units occur at a first set of positions in a 
polymer and a second sequence model represents the 
frequency with which specific monomeric units occur at a 
second set of positions in the same polymer. Sequence 
models that are differential with respect to each other 
can be produced from different subsets of monomeric units 
and/or have different parameters. For example, two 
sequence models that are differential with respect to 
each other can both be position dependent being produced 
from different training sets, position independent being 
produced from different training sets, one sequence model 
can be position dependent while another is position 
independent both being produced from the same training 
set or one sequence model can be position dependent while 
another is position independent each being produced from 
different training sets. Positions and frequencies can 
be represented redundantly in a first sequence model and 
second, differential sequence model so long as a set of 
positions or frequencies in the first model contains at 
least one position or frequency that is not present in 
the set of the differential model . 

As used herein the term ''relationship," when 
used in reference to a sequence and a sequence model, 
refers to a comparison of the presence, absence or 
identities of monomers at various positions in a polymer 
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sequence and sequence model. The term includes 
comparison of the presence, absence or identities of 
amino acids in a polypeptide sequence and a sequence 
model or comparison of the presence, absence or 
5 identities of nucleotides in a polynucleotide sequence 
and a sequence model. 

As used herein the term ^'correspondence, " when 
used in reference to a sequence and a sequence model, 
refers to a statistically relevant similarity between the 
10 sequence and the sequence model. A statistically 

relevant similarity can be indicated by a low expectation 
value (E value) or high bit score. The E value is 
understood in the art to be the statistically determined 
number of sequences that would be found by searching a 
15 database with a random model that match as well or better 
to the random model than the sequence retrieved by 
searching the database with a trained model matches to 
the trained model, as described in Durbin et al.. 
Biological Sequence Analysis Cambridge University Press 
20 (1998) . A sequence having a statistically relevant 

similarity to a sequence model can have an E value less 
than, or -ln(E) greater than, a cutoff E value. A cutoff 
E value can be at a specified threshold value of E 
including, for example, 100, 50, 10, 5, 2, 1, 0.5, 0.2, 
25 0.1, or 0.01 that can be identified according to methods 
described below. The bit score is understood in the art 
to be a measure of the probability that the sequence 
belongs to the set of polypeptides used to train the 
model, as described in Durbin et al., supra. 

30 

As used herein the term "selected distance," 
when used in reference to a polypeptide, refers to a 
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length separating locations in a polypeptide and/or 
separating locations in a polypeptide and bound ligand. 
A location in a polypeptide can include, for example, an 
amino acid location, an atom location, or location 
5 identified relative to an amino acid such as a center of 
gravity or center of a volume occupied by the amino acid. 
A location in a bound ligand can include, for example, a 
moiety location, an atom location, or location identified 
relative to the bound ligand, or moiety thereof such as a 
10 center of gravity or center of an occupied volume. A 
length separating two locations can be a length between 
points in a three dimensional structure including, for 
example, a length of a line drawn between locations in a 
high resolution structure model or a length measured by 
15 spectroscopic means such as an NOE method. A length 

separating two locations can be a length between points 
in a primary sequence of a polypeptide including, for 
example, a number of amino acids separating two points, a 
number of atoms separating two points, or calculated 
20 distances thereof based on theoretical bond lengths. 
Additionally, a selected distance can include a 
combination of lengths determined in a 3 dimensional 
structure and primary sequence. For example, amino acids 
within a selected distance can include a first subset of 
25 those within an identified length from a bound ligand in 
the 3 dimensional structure and a second subset 
containing others within an identified number of amino 
acids, in the primary sequence, from those in the first 
subset . 

30 The invention provides a method for identifying 

a pharmacocluster . The method includes the steps of (a) 
determining bound conformations of a ligand bound to 
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different polypeptides, and (b) clustering two or more 
bound conformations of the ligand having substantially 
the same bound conformation, thereby identifying a 
pharmacocluster. The invention also provides a method 
5 for identifying a member of a pharmacocluster. The 
method includes the steps of (a) determining a bound 
conformation of a ligand bound to a polypeptide; and (b) 
determining a pharmacocluster having substantially the 
same bound conformation as the bound conformation, 
10 thereby identifying the bound conformation of the ligand 
as a member of the pharmacocluster. 

A bound conformation of a ligand bound to a 
polypeptide can be determined from a previously observed 
molecular structure or from data specifying a molecular 

15 structure for a bound conformation of a ligand. 

Previously observed structures can be acquired for use in 
the invention by searching a database of existing 
structures. An example of a database that includes 
structures of bound conformations of ligands bound to 

20 polypeptides is the Protein Data Bank (PDB, operated by 
the Research Collaboratory for Structural Bioinf ormatics, 
see Berman et al.. Nucleic Acids Research , 28:235-242 
(2000)). A database can be searched, for example, by 
querying based on chemical property information or on 

25 structural information. In the latter approach, an 

algorithm based on finding a match to a template can be 
used as described, for example, in Martin, '"Database 
Searching in Drug Design,'' J- Med. Chem. 35:2145-2154 
(1992) . 



30 A bound conformation of a ligand bound to a 

polypeptide can be determined from an empirical 
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measurement, or from a database. Data specifying a 
structure can be acquired using any method available in 
the art for structural determination of a ligand bound to 
a polypeptide. For example, X-ray crystallography can be 
5 performed with a crystallized complex of a polypeptide 
and ligand to determine a bound conformation of the 
ligand bound to the polypeptide- Methods for obtaining 
such crystal complexes and determining structures from 
them are well known in the art as described for example 

10 in McRee et al.. Practical Protein Crvsta lloaraphv, 

Academic Press, San Diego 1993; Stout and Jensen, X-ray 
Structure Determination: A practical guide , 2''^ Ed. Wiley, 
New York (1989); and McPherson, The Prep aration and 
Analysis of Protein Crystals , Wiley, New York (1982) . 

15 Another method useful for determining a bound 

conformation of a ligand bound to a polypeptide is 
Nuclear Magnetic Resonance (NMR) . NMR methods are well 
known in the art and include those described for example 
in Reid, Protein NMR Techniques , Humana Press, Totowa NJ 

20 (1997); and Cavanaugh et al.. Protein N MR Spectroscopy: 
Principles and Practice , ch. 7, Academic Press, San Diego 
CA (1996) . 

A bound conformation of a ligand can also be 
determined from a hypothetical model. For example, a 

25 hypothetical model of a bound conformation of a ligand 
can be produced using an algorithm which docks a ligand 
to a polypeptide of known structure and fits the ligand 
to the polypeptide binding site. Algorithms available in 
the art for fitting a ligand structure to a polypeptide 

30 binding site include, for example, DOCK (Kuntz et al., 
Mol. Biol. 161:269-288 (1982)) and INSIGHT98 (Molecular 
Simulations Inc., San Diego, CA) . 
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A molecular structure can be conveniently 
stored and manipulated using structural coordinates. 
Structural coordinates can occur in any format known in 
the art so long as the format can provide an accurate 
5 reproduction of the observed structure. For example, 
crystal coordinates can occur in a variety of file types 
including, for example, .fin, .df, .phs, or .pdb as 
described for example in McRee, supra. Although the 
examples above describe structural coordinates derived 
10 from X-ray crystallographic analysis or NMR spectroscopy, 
one skilled in the art will recognize that structural 
coordinates can be derived from any method known in the 
art to determine a bound conformation of a ligand bound 
to a polypeptide. 

15 Structures at atomic level resolution can be 

useful in the methods of the invention. Resolution, when 
used to describe molecular structures, refers to the 
minimum distance that can be resolved in the observed 
structure. Thus, resolution where individual atoms can 

20 be resolved is referred to in the art as atomic 
resolution. Resolution is commonly reported as a 
numerical value in units of Angstroms (A, 10"^° meter) 
correlated with the minimum distance which can be 
resolved such that smaller values indicate higher 

25 resolution. Bound conformations of a ligand useful in 

the methods of the invention can have a resolution better 
than about 10 A, 5 A, 3 A, 2.5 A, 2.0 A, 1.5 A, 1.0 A, 
0.8 A, 0.6 A, 0.4 A, or about 0.2 A or better. 
Resolution can also be reported as an all atom RMSD as 

30 used, for example, in reporting NMR data. Bound 

conformations of a ligand useful in the methods of the 
invention can have an all atom RMSD better than about 10 
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A, 5 A, 3 A, 2.5 A, 2.0 A, 1.5 A, 1.0 A, 0.8 A, 0.6 A, 
0.4 A, or about 0.2 A or better. 

An advantage of the methods of the invention is 
that a structure of a polypeptide bound to a bound 
5 conformation of a ligand need not be determined to 
identify a pharmacocluster . Thus, methods that detect 
only the structure of the ligand can be used in the 
invention. Additionally, in some cases determination or 
refinement of only the structure of the ligand in a 
10 polypeptide-ligand complex will be required. Methods 
that can be used to determine a conformation-dependent 
property of a ligand in a polypeptide-ligand complex 
without determining the structure of the polypeptide 
include, for example. Electron Nuclear Double Resonance 
15 spectroscopy (ENDOR, as described in Van Doorslaer and 

Schweiger, Natnrwissenschaften 87:245-55(2000) ) , Electron 
Paramagnetic Resonance spectroscopy (EPR, described in 
Cantor and Schimmel Biophysical Chemi stry. Part I: The 
conformation of biological macro molecules W. H. Freeman 
20 and Company (1980)), chemically induced dynamic nuclear 
polarization (CIDNP, described in Siebert et al., 
Glycoconi J. 14: 945-9 (1997) and Consonni et al., FEES 
Lett. 372:135-9 (1995)), solid state NMR (described in 
Mehring, M. High Resolution NMR spec troscopy in 
25 Solids . 2"'^ ed. Springer-Verlag, Berlin (1983) and liquid 
phase NMR (described in Wuthrich, NMR of Proteins and 
Nucleic Acids John Wiley & Sons, Inc. (1986)). Thus, 
the invention can be performed in a manner whereby the 
time and cost associated with a full determination of a 
30 polypeptide structure is avoided. 
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Any representation that correlates with the 
structure of a bound conformation of a ligand can be used 
in the methods of the invention. For example, a 
convenient and commonly used representation is a 
5 displayed image of the structure. Displayed images that 
are particularly useful for determining the bound 
conformation of a ligand bound to polypeptides include, 
for example, ball and stick models, density maps, space 
filling models, surface map, Connolly surfaces, Van der 
10 Waals surfaces or CPK model. Display of images as a 
computer output, for example, on a video screen can be 
advantageous as described below. 

Clustering can be performed with any ligand or 
any number of bound conformations of a ligand. The 
15 methods of the invention can be performed by clustering 2 
or more bound conformations of a ligand. For example, 
clustering can be performed with 3 or more, 4 or more, 5 
or more, 6 or more, 7 or more, 8 or more, 9 or more, 10 
or more, 11 or more, 12 or more, 13 or more, 14 or more, 

20 15 or more or 20 or more bound conformations of a ligand. 
The methods of the invention can be used with any number 
bound conformations of a ligand. Due to the large sizes 
of data sets required to represent bound conformations of 
a ligand, methods of clustering bound conformations are 

25 generally performed on a computer. The methods are 

compatible with any computer that can support molecular 
modeling software including for example a personal 
computer, silicon graphics workstation, or supercomputer. 
A variety of computer software programs are available for 

30 molecular modeling including, for example, GRASP 

(Nicholls, A., supra), ALADDIN (Van Drie et al. supra), 
INSIGHT98 (Molecular Simulations Inc., San Diego CA) , 
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RASMOL (Sayle et al-. Trends Biochem Sci. 20:374-376 
(1995)) and MOLMOL (Koradi et al., J, Mol, Graphics 
14:51-55 (1996 ) ) . 



Once a bound conformation of a ligand bound to 
5 different polypeptides has been determined, two or more 
bound conformations of the ligand can be compared and 
those having substantially the same bound conformation 
can be clustered. Methods of comparison include, for 
example, a method that provides alignment of two or more 
10 bound conformations of a ligand and evaluation of the 
degree of overlap in the two structures. Methods of 
comparison can be performed in an iterative fashion until 
a best fit is identified. 

Methods of comparing bound conformations of 

15 bound ligands include, for example, cluster analysis, 
visual inspection and pairwise structural comparisons. 
Cluster analysis is commonly performed by, but not 
limited to, partitioning methods or hierarchical methods 
as described, for example, in Kauffman and Rousseeuw, 

20 Finding Groups in Data: An Introduction to Cluster 
Analysis . John Wiley and Sons Inc., New York (1990). 
Partitioning methods that can be used include, for 
example, partitioning around mediods, clustering large 
applications, and fuzzy analysis, as described in 

25 Kauffman and Rousseeuw, supra. Hierarchical methods 
useful in the invention include, for example, 
agglomerative nesting, divisive analysis, and monothetic 
analysis, as described in Kauffman and Rousseeuw, supra. 
Algorithms for cluster analysis of molecular structures 

30 are known in the art and include, for example, COMPARE 
(Chiron Corp, 1995; distributed by Quantum Chemistry 
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program Exchange, Indianapolis IN) . COMPARE can be used 
to make all possible pairwise comparisons between a set 
of conformations of the same ligand(s) . COMPARE reads 
PDB files and uses a Ferro-Hermanns ORIENT algorithm for 
5 a least squares root mean square (RMS) fit. The 

structures can be clustered into groups using the Jarvis- 
Patrick nearest neighbors algorithm. Based on the RMS 
deviation between ligand conformers, a list of ^nearest 
neighbors' for each conformer are generated. Two 
10 conformers are then grouped together or clustered if: (1) 
the RMS deviation is sufficiently small and (2) if both 

y conformers share a determined number of common 

^neighbors'' . Both criteria are adjusted by the program 
to generate clusters based on a user defined cutoff for 

iO 15 distance between individual clusters. Follow up analysis 
was conducted using Insightll to verify clusters. A 

M member conformation is identified as being closer to the 

averaged coordinates of conformations within its family 

ffl than to the averaged coordinates of any other family. 

20 Using methods such as those described above, 

one skilled in the art will know how to identify 
conformations that are substantially the same. For 
example, similarity can be evaluated according to the 
goodness of fit between two or more bound conformations 

25 of a ligand. Goodness of fit can be represented by a 
variety of parameters known in the art including, for 
example, the root mean square deviation (RMSD) . A lower 
RMSD between structures correlates with a better fit 
compared to a higher RMSD between structures. Bound 

30 conformations of a ligand having substantially the same 
conformations can be identified by comparing mean RMSD 
values within and between pharmacoclusters, for example. 
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as demonstrated in Example I. Accordingly, bound 
conformations of a ligand having substantially the same 
conformations can have a mean RMSD compared to an average 
structure for the pharmacocluster that is less than 1.1 
5 A. Two or more bound conformations of a ligand can be 
clustered by assigning bound conformations of a ligand 
into a collection such that the conformations of a ligand 
residing in the collection are substantially the same. 
Members of a pharmacocluster can also be identified as 
10 having RMSD values compared to an average structure for 
Q the pharmacocluster that are less than 1.0 A, 0.9 A, 0.8 

5 A, 0.7 A, 0.6 A, 0.5 A, 0.4 A, 0.3 A, 0.2 A or 0.1 A. 

A bound conformation of a ligand that is a 
Ul member of a pharmacocluster can also be identified by 

15 comparing the RMSD for the bound conformation to an 
n average conformation of the members in multiple 

5^5 pharmacoclusters . Using this value for comparison, a 

□ member conformation is identified as having a smaller 

RMSD when compared to the averaged coordinates of 
20 conformations within its family than when compared to the 
averaged coordinates of any other family. In addition, a 
member of a pharmacocluster can be identified as having 
an RMSD compared to an average conformation of the 
members in a pharmacocluster that is smaller than the 
25 EyyiSD between each family's average coordinates. For 
example, as described in Example I, RMSD values for 
members of pharmacoclusters 1-8 as presented in Tables 
3A, 4A, 5A, 6A, 7A, 8A, 9A or lOA, respectively, can be 
compared to RMSD values between each pharmacocluster as 
30 presented in Table 2. Comparisons similar to those 

described above can be made for bound conformations of 
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any ligand according to the methods described in the 
Examples . 

In addition, bound conformations of a ligand 
can be compared with respect to dihedral angles at 
5 particular bonds. Exemplary methods for comparing 

dihedral angles between pharmacoclusters is described in 
Example I and Table 1. Comparison between dihedral 
angles can be used, for example, in combination with 
overall RMSD comparisons such as those described above. 

10 Therefore, bound conformations that are not easily 

distinguished by comparison of overall RMSD alone, can be 
distinguished according to the combined comparison of 
RMSD and dihedral angle. Bound conformations of a ligand 
that are members of different pharmacoclusters can have 

15 dihedral angles that differ, for example, by at least 
about 10 degrees, 30 degrees, 45 degrees, 90 degrees or 
180 degrees. 

The invention also provides a pharmacocluster 
selected from the cluster consisting of pharmacocluster 
20 1, pharmacocluster 2, pharmacocluster 3, pharmacocluster 
4, pharmacocluster 5, pharmacocluster 6, pharmacocluster 
7, and pharmacocluster 8 correlated with the 
pharmacofamilies listed in Table 11. 

Pharmacoclusters 1 through 8 contain bound 
25 conformations of NAD(P) (H) determined from structures 
deposited in the PDB for NAD{P) (H) bound to 
oxidoreductase polypeptides. Pharmacoclusters are shown 
in Figure 1 and described in further detail in Example I. 
The pharmacoclusters of Figure 1 display substantial 
30 overlap between bound conformations of NAD(P) (H) within 
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the cluster, as can be identified by visual inspection of 
the structures. Quantitative comparison of the bound 
conformations in each pharmacocluster demonstrates that 
each pharmacocluster displays less than about 1.1 A 
5 difference in RMSD between each conformation of NAD(P) (H) 
and the average bound conformation for each cluster as 
described in Example I. 

Pharmacoclusters can be used to identify a 
ligand having specificity for one or more polypeptide 
10 pharmacofamilies (see Example V) . As described herein, a 
pharmacophore model or conformer model can be derived 

from one or more cluster. These models can be used to 
identify a ligand having specificity for one or more 
pharmacofamilies of oxidoreductases, for example, by 
15 using the model to query a database of molecules for a 
potential ligand or by using the model to guide in the 
design of a synthetic ligand- An example of using a 
pharmacophore of the invention to identify a binding 
compound is provided in Example VI. 

20 Pharmacoclusters, including, for example, 

pharmacoclusters 1 through 8 can also be used to identify 
a new polypeptide member of a polypeptide pharmacof amily. 
Using the methods described herein, for example, a 
pharmacocluster can be used to produce a pharmacophore 

25 model or conformer model to which a bound conformation of 
a ligand can be compared. A polypeptide bound to a bound 
conformation of a ligand that is similar to the model can 
be classified into an appropriate polypeptide 
pharmacof amily based on this comparison. By a similar 

30 method, a bound conformation of a ligand can be directly 
compared to a pharmacocluster to classify the polypeptide 
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bound to the conformation of a ligand into an appropriate 
pha rma CO f ami 1 y • 

The methods of the invention can also be used 
with a portion of a bound conformation of a ligand to 
5 identify a pharmacocluster . The method consists of (a) 
determining a bound conformation of a ligand, or portion 
thereof, bound to two or more polypeptides, and (b) 
clustering two or more bound conformations of the ligand, 
or portion thereof having substantially the same bound 
10 conformation, thereby identifying a pharmacocluster. 

A bound conformation of a portion of a ligand 
can include selected atoms and/or bonds of a ligand and 
can include^ for example, a continuous sequence of atoms 
and/or bonds or a discontinuous sequence of selected 

15 atoms and/or bonds that^ when described independent of 
the complete ligand structure, may not appear to be 
attached to each other. Such a portion can include 2 or 
more atoms of a bound conformation of a ligand or 3 or 
more, 4 or more, 5 or more, 6 or more, 7 or more, 8 or 

20 more, 9 or more, 10 or more, 15 or more, 20 or more, 25 
or more or 50 or more atoms of a bound conformation of a 
ligand. A bound conformation of a portion of a ligand 
bound to a polypeptide can be identified according to the 
same methods described above for identifying a bound 

25 conformation of a ligand bound to a polypeptide. Two or 
more bound conformations of a portion of a ligand can be 
clustered as described above so long as the bound 
conformations that are clustered correspond to bound 
portions of the ligand having the same structural 

30 formula. For example, in a case where determination of 
the complete structure of a ligand has not been achieved. 
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a bound conformation of a portion of the ligand 
corresponding to the structurally determined portion can 
be used in the methods of the invention. 

A pharmacocluster can include portions of bound 
5 conformations derived from different ligands so long as 
the portions have a core bound conformation that is 
substantially the same. For example, portions having the 
same structural formula and bond configuration can share 
a core bound conformation. The bond configuration 
10 describes the relative position of atoms attached to a 
chiral atom of a ligand. Accordingly, R and S 
sterioisomers of a chiral atom have different bond 
configurations. Other terms used in the art to designate 
different bond configurations include, for example, cis 
15 and trans configurations of atoms attached to carbons 
that are double bonded, or Z and E configurations of 
atoms attached to carbons that are double bonded. An 
example of portions of ligands having the same structural 
formula and bond configuration that can share a core 
20 bound conformation are the nicotinamide adenine 
dinucleotide portions of nicotinamide adenine 
dinucleotide phosphate (NADP) and nicotinamide adenine 
dinucleotide (NAD) . Additionally, portions of ligands 
having different charge, atom substitution or bond 
25 hybridization can share a core bound conformation. An 
example of portions of ligands having different charge 
and bond hybridization that can share a core bound 
conformation are the nicotinamide adenine dinucleotide 
portions of oxidized nicotinamide adenine dinucleotide 
30 (NAD) and reduced nicotinamide adenine dinucleotide 
(NADH) . In cases where the core structures of two 
ligands bind with substantially the same conformation to 
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polypeptides, the core bound conformations can be 
clustered according to the methods of the invention (see 
Example 1) • 

Substantially the same bound conformation of a 
portion of a bound conformation of a ligand, including 
non-continuous atoms, can be identified according to the 
root mean square deviation and compared directly. 
Conformations of portions having different numbers of 
atoms can also be compared via root mean square deviation 
per equivalent atom (RMSD/N, where N is the number of 
atoms compared) . A lower value of RMSD/N indicates 
increased similarity between the two or more bound ligand 
conformations that are clustered. One skilled in the art 
will know that RMSD/N has a compensational origin and 
consideration of the effect of N is required for 
comparison of RMSD/N between pharmacoclusters having 
different values of N. For example, the lower the value 
of RMSD/N the lower should be the value of N to indicate 
substantial similarity. 

20 The invention can be used with any ligand for 

which bound conformations of the ligand bound to 
different polypeptides can be determined including, for 
example, chemical or biological molecules such as simple 
or complex organic molecules, metal-containing compounds, 

25 carbohydrates, peptides, peptidomimetics, carbohydrates, 
lipids, nucleic acids, and the like. 

In one embodiment, the compositions and methods 

of the invention can be used with a ligand that is a 
nucleotide derivative including, for example, a 
30 nicotinamide adenine dinucleotide-related molecule. 
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Nicotinamide adenine dinucleotide-related (NAD-related) 
molecules that can be used in the methods of the 
invention can be selected from the group consisting of 
oxidized nicotinamide adenine dinucleotide (NAD"") , 
5 reduced nicotinamide adenine dinucleotide (NADH) , 
oxidized nicotinamide adenine dinucleotide phosphate 
(NADP'') , and reduced nicotinamide adenine dinucleotide 
phosphate (NADPH) . An NAD-related molecule can also be a 
mimetic of the above- described molecules. Use of a NAD- 
10 related molecule to identify pharmacoclusters is 
described in Example I. 

A mimetic is a molecule that has at least one 
function that is substantially the same as a function of 
a second molecule. A mimetic of a ligand can be 

15 identified according to its ability to bind to the same 
sites on a polypeptide as the ligand. For example, a 
mimetic can be identified by a binding competition assay 
using a ligand and a mimetic. The structure of a mimetic 
can be similar or different compared to the structure of 

20 the second molecule. The term can encompass molecules 
having portions similar to corresponding portions of the 
ligand in terms of structure or function. 

Examples of mimetics to the common ligand NADH^ 
for example cibacron blue, are described in Dye-Liqand 

25 Chromatography , Amicon Corp., Lexington MA (1980). 

Numerous other examples of NADH-mimics, including useful 
modifications to obtain such mimics, are described in 
Everse et al. (eds.). The Pyridine Nucleotide C oenzymes, 
Academic Press, New York NY (1982). Particular analogs 

30 include nicotinamide 2-aminopurine dinucleotide, 

nicotinamide 8-azidoadenine dinucleotide, nicotinamide 1- 



41 

deazapurine dinucleotide, 3-aminopyridine adenine 
dinucleotide, 3-acetyl pyridine adenine dinucleotide, 
thiazole amide adenine dinucleotide, 3- 
diazoacetylpyridine adenine dinucleotide and 5- 
5 aminonicotinamide adenine dinucleotide. Particular 
mimetics can be identified and selected by ligand- 
displacement assays, for example using competitive 
binding assays with a known ligand as is well known in 
the art. Mimetic candidates can also be identified by 
10 searching databases of compounds for structural 
similarity with the common ligand or a mimetic. 

In another embodiment, the methods of the 
invention can be used with a ligand that is an adenosine 
phosphate-related molecule. Adenosine phosphate-related 

15 molecules can be selected from the group consisting of 
adenosine triphosphate (ATP), adenosine diphosphate 
(ADP) , adenosine monophosphate (AMP), and cyclic 
adenosine monophosphate (cAMP) . An adenosine phophate- 
related molecule can also be a mimetic of the above- 

20 described molecules. A mimetic of an adenosine 

phosphate-related molecule that can be used in the 
invention includes, for example, quercetin, 
adenylylimidodiphosphate (AMP-PNP) or olomoucine. 

A ligand useful in the methods of the invention 
25 can be a cofactor, coenzyme or vitamin including, for 
example, NAD, NADP, or ATP as described above. Other 
examples include thiamine (vitamin B-^) , riboflavin 
(vitamin B2) , pyridoximine (vitamin Bg) , cobalamin 
(vitamin B12) , pyrophosphate, flavin adenine dinucleotide 
30 (FAD), flavin mononucleotide (FMN), pyridoxal phosphate, 
coenzyme A, ascorbate (vitamin C) , niacin, biotin, heme. 
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porphyrin, folate, tetrahydrof olate, nucleotide such as 
guanosine triphosphate, cytidine triphosphate, thymidine 
triphosphate, uridine triphosphate, retinol (vitamin A) , 
calciferol (vitamin D2) , ubiquinone, ubiquitin, a- 
5 tocopherol (vitamin E) , farnesyl, geranylgeranyl, pterin, 
pteridine or S-adenosyl methionine (SAM) . 

A polypeptide can be used as a ligand in the 
invention. For example, a ligand can be a naturally 
occurring polypeptide ligand such as a ubiquitin or 

10 polypeptide hormone including, for example, insulin, 
human growth hormone, thyrotropin releasing hormone, 
adrenocorticotropic hormone, parathyroid hormone, 
follicle stimulating hormone, thyroid stimulating 
hormone, luteinizing hormone, human chorionic 

15 gonadotropin, epidermal growth factor, nerve growth 

factor and the like. In addition a polypeptide ligand 
can be a non-naturally occurring polypeptide that has 
binding activity. Such polypeptide ligands can be 
identified, for example, by screening a synthetic 

20 polypeptide library such as a phage display library or 
combinatorial polypeptide library as described below. A 
polypeptide ligand can also contain amino acid analogs or 
derivatives such as those described below. Methods of 
isolation of a polypeptide ligand are well known in the 

25 art and are described, for example, in Scopes, Protein 
Purification: Principles and Practice , 3^^ Ed., Springer- 
Verlag, New York (1994); Duetscher, Methods in 
Enzvmoloav , Vol 182, Academic Press, San Diego (1990); 
and Coligan et al.. Current protocols in Protein Science, 

30 John Wiley and Sons, Baltimore, MD (2000) . 
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A nucleic acid can also be used as a ligand in 
the invention. Examples of nucleic acid ligands useful 
in the invention include DNA, such as genomic DNA or cDNA 
or RNA such as mRNA, ribosomal RNA or tRNA. A nucleic 
acid ligand can also be a synthetic oligonucleotide. 
Such ligands can be identified by screening a random 
oligonucleotide library for ligand binding activity, for 
example, as described below. Nucleic acid ligands can 
also be isolated from a natural source or produced in a 
recombinant system using well known methods in the art 
including, for example, those described in Sambrook et 
al.. Molecular Cloning: A Laboratorv Manual , 2nd ed.. 
Cold Spring Harbor Press, Plainview, New York (1989); 
Ausubel et al.. Current Protocols in Molec ular Biology 
(Supplement 47), John Wiley & Sons, New York (1999). 

A ligand used in the invention can be an amino 
acid, amino acid analog or derivatized amino acid. An 
amino acid ligand can be one of the 20 essential amino 
acids or any other amino acid isolated from a natural 
source. Amino acid analogs useful in the invention 
include, for example, neurotransmitters such as gamma 
amino butyric acid, serotonin, dopamine, or 
norepenephrine or hormones such as thyroxine, epinephrine 
or melatonin. A synthetic amino acid, or analog thereof, 
can also be used in the invention. A synthetic amino 
acid can include chemical modifications of an amino acid 
such as alkylation, acylation, carbamylation, iodination, 
or any modification that derivatizes the amino acid. 
Such derivatized molecules include, for example, those 
molecules in which free amino groups have been 
derivatized to form amine hydrochlorides, p-toluene 
sulfonyl groups, carbobenzoxy groups, t-butyloxycarbonyl 
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groups, chloroacetyl groups or formyl groups. Free 
carboxyl groups can be derivatized to form salts, methyl 
and ethyl esters or other types of esters or hydrazides. 
Free hydroxyl groups can be derivatized to form 0-acyl or 
5 0-alkyl derivatives. The imidazole nitrogen of histidine 
can be derivatized to form N-im-benzylhistidine . 
Naturally occurring amino acid derivatives of the twenty 
standard amino acids can also be included in a cluster of 
bound conformations including^ for example, 
10 4-hydroxyproline, 5-hydroxylysine, 3-methylhistidine, 
homoserine, ornithine or carboxyglutamate . 

A lipid ligand can also be used in the 
invention. Examples of lipid ligands include 
triglycerides, phospholipids, glycolipids or steroids. 
15 Steroids useful in the invention include, for example, 

glucocorticoids, mineralocorticoids, androgens, estrogens 
or progestins. 

Another type of ligand that can be used in the 
invention is a carbohydrate. A carbohydrate ligand can 

20 be a monosaccharide such as glucose, fructose, ribose, 
glyceraldehyde, or erythrose; a disaccharide such as 
lactose, sucrose, or maltose; oligosaccharide such as 
those recognized by lectins such as agglutinin, peanut 
lectin or phytohemagglutinin, or a polysaccharide such as 

25 cellulose, chitin, or glycogen. 

Methods for producing pluralities of compounds 
to use as ligands, including chemical or biological 
molecules such as simple or complex organic molecules, 
metal-containing compounds, carbohydrates, peptides, 
30 peptidomimetics, carbohydrates, lipids, nucleic acids. 



and the like, are well known in the art (see, for 
example, in Huse, U.S. Patent No. 5,264,563; Francis et 
al., Curr. Qpin. Chem. Biol . 2:422-428 (1998); Tietze et 
al., Curr, Biol ., 2:363-371 (1998); Sofia, Mol. Divers . 
5 3:75-94 (1998); Eichler et al., Med, Res. Rev. 15:481-496 
(1995); Gordon et al., J, Med. Chem. 37: 1233-1251 
(1994); Gordon et al., J. Med. Chem. 37: 1385-1401 
(1994); Gordon et al., Acc. Chem, Res. 29:144-154 (1996); 
Wilson and Czarnik, eds.. Combinatorial Chemistry: 

10 Synthesis and Application , John Wiley & Sons, New York 
(1997), Gold et al., U.S. Pat Nos. 5,475,096 (1995), 
5,789,157 (1998), and 5,270,163 (1993)). The advantage 
of using such a combinatorial library is that molecules 
do not have to be individually generated to identify a 

15 ligand that binds a polypeptide. Also, no prior 

knowledge of the exact characteristics of a binding 
polypeptide is required when using a combinatorial 
library. Libraries containing large numbers of natural 
and synthetic compounds also can be individually 

20 synthesized or obtained from commercial sources. 

In addition, the invention provides a method 
for identifying a conformation-dependent property of a 
ligand. The method includes the steps of (a) determining 
bound conformations of a ligand bound to different 

25 polypeptides; (b) identifying two or more bound 

conformations of the ligand having substantially the same 
bound conformation, and (c) identifying a conformation- 
dependent property of the bound conformations of the 
ligand having substantially the same bound conformation, 

30 the conformation-dependent property being correlated with 
the bound conformation of the ligand. 
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A conformation-dependent property can be 
identified as any property that correlates with a bound 
conformation of a ligand such that a change in the bound 
conformation results in a change in the conf ormation- 
5 dependent property. Accordingly, a bound conformation of 
a ligand, or a portion thereof, can be a conformation- 
dependent property. A portion of a bound conformation of 
a ligand can be a contiguous fragment or a non-contiguous 
set of atoms or bonds. A bound conformation of a ligand, 
10 or portion thereof, can be identified by any method for 
determining the three dimensional structure of a ligand 
including as disclosed herein. 

Other conformation-dependent properties 
include, for example, absorption and emission of heat, 
15 absorption and emission of electromagnetic radiation, 

rotation of polarized light, magnetic moment, spin state 
of electrons, or polarity, as disclosed herein, or other 
properties that can be identified as a spectroscopic 
signal. Methods known in the art for measuring changes 
20 in absorption and emission of heat that correlate with 
changes in bound conformation of a ligand include, for 
example, calorimetry. Methods known in the art for 
measuring changes in absorption and emission of 
electromagnetic radiation as they correlate with changes 
25 in bound conformation of a ligand include, for example, 
UV/VIS spectroscopy, fluorimetry, luminometry, infrared 
spectroscopy, Raman spectroscopy, resonance Raman 
spectroscopy, X-ray absorption fine structure 
spectroscopy (XAFS) and the like. A change in a bound 
30 conformation of a ligand that is correlated with a change 
in rotation of polarized light can be measured with 
circular dichroism spectroscopy or optical rotation 
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spectroscopy. A change in magnetic moment or spin state 
of an electron that correlates with a change in a bound 
conformation can be measured, for example, with Electron 
paramagnetic resonance spectroscopy (EPR) or nuclear 
5 magnetic resonance spectroscopy (NMR) . 

When based on NMR data, a conformation- 
dependent property can be identified as an NMR signal 
including, for example, chemical shift, J coupling, 
dipolar coupling, cross-correlation, nuclear spin 

10 relaxation, transferred nuclear Overhauser effect, and 
any combination thereof. A conformation-dependent 
property can be identified by NMR methods in both fast 
and slow exchange regimes. For example, in many cases, 
the exchange rate of a complex between ligand and 

15 polypeptide is faster than the ligand spin relaxation 

rate (1/Tih) - In this situation, referred to as the ''fast 
exchange regime," transferred nuclear Overhauser effect 
(NOE) experiments can be performed to measure an intra- 
ligand proton-proton distance (Wuthrich, NMR of p roteins 

20 and Nucleic Acids , Wiley, New York (1986) and Gronenborn, 
J. Maan. Res. 53:423-442 (1983)). Labeling of 
polypeptides is not required, and the ligand polypeptide 
concentration ratio can be adjusted to minimize line 
broadening of the ligand resonances while retaining 

25 strong NOE contribution from the bound form. 

In a fast exchange regime, cross-correlated 
relaxation measurements can also provide structural 
information on ligand torsion angles (Carlomagno et al., 
J- Am. Chem Soc. 121:1945-1948 (1999)). These 
30 measurements include the ^H-^H dipole-dipole cross- 
correlation but can be extended to other cross-correlated 
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relaxation mechanisms involving also homo- and 
heteronuclear chemical shielding anisotropy relaxation, 
as well as quadrupolar relaxation. For most of these 
heteronuclear experiments, the natural abundance of the 
5 isotope can be exploited. In cases where natural 
abundance of the isotope measured is not sufficient, 
isotope enriched ligands can be obtained from commercial 
sources such as Isotek (Miamisburg, OH) or Cambridge 
Isotope Laboratories (Andover, MA) or prepared by methods 

10 known in the art. Another method to determine a 

conformation-dependent property of a ligand in a fast 
exchange regime is use of residual homo- and 
heteronuclear dipolar couplings in partially aligned 
samples (Tolman et al. Proc. Natl. Acad , Sci. USA 

15 92:9279-9283 (1995) ) . 

In the slow exchange regime, the NMR signals 
arising from the bound conformation of the ligand are 
distinguished from those of the polypeptide to reduce 
resonance overlap. This can be achieved with different 

20 isotope labeling schemes of polypeptide, ligand or both. 
For large systems, perdeuteration of macromolecules and 
TROSY-type experiments (Pervushkin, Proc. Na tl. Acad. 
Sci. USA 94:12366-12371 (1997)) can be used to minimize 
signal losses due to fast transverse relaxation of the 

25 resonances of the complex. With the appropriate sample 
requirements and isotope filtered experiments, cross- 
correlations, cross-relaxations and residual dipolar 
couplings can be measured and provide necessary 
structural information. 



30 In addition, homo- and heteronuclear two and 

three bond J couplings can be obtained to provide 



information on torsion angles (Wuthrich, supra) . For 
example, as shown in Table 1 the bound conformations of 
NADP in pharmacocluster 4 and pharmacocluster 5 differ by 
a torsion angle defined by the atoms PN-05 ' N-C5 ' N-C4 ' N 
5 (See Figure 2 for atom labeling and bond location) . 

Specifically, pharmacocluster 4 has a PN-05 'N-C5 ' N-C4 ' N 
torsion angle of 145 degrees and pharmacocluster 5 has a 
PN-05'N-C5'N-C4'N angle of -112 degrees. These torsion 
angles can be measured and distinguished by measuring the 

10 three bond ^^P-^^C4' J coupling constants that correspond 
to this torsion angle (Marino, Acc. Chem. Res- 32:614-623 
(1999)), Basically, two ^H-^^C correlation experiments 
can be performed with and without ^^P decoupling during ^^C 
evolution. The intensity ratio of the 4 V^^C4 ' cross 

15 peak from each experiment is proportional to the ^^P-^^C4 ' 
J coupling constant. 

Correlation of a conformation-dependent 
property with a bound conformation of a ligand can be 
achieved by any method that has sufficient sensitivity to 

20 detect changes that correlate with changes in bound 
conformation of a ligand. Such a correlation can be 
determined by measuring a conformation-dependent property 
for various conformations of a ligand and determining the 
extent of change in the signal with change in the 

25 conformation. Signal changes that correlate with changes 
in conformation and that are detectable with a signal to 
noise ratio accepted in the art as significant can be 
used in the invention. 

Correlation between a conformation-dependent 
30 property and a conformation can be determined for a 
ligand bound to any partner so long as binding is 
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specific and stable. For example, for purposes of 
establishing a correlation, changes in a conformation 
dependent property that correlate with changes in bound 
conformation of a ligand can be determined for a ligand 
5 bound to polypeptides from different polypeptide 

pharmacofamilies. A bound conformation of the ligand in 
each complex can be determined and a conformation- 
dependent property can be measured for each complex. 
Comparison of bound conformations of the ligand in each 

10 complex with a measured conformation-dependent property 
can be used to establish a correlation. Demonstration of 
a method for establishing a correlation between an NMR 
signal and bound conformations of a ligand is described 
herein (see Example IV) . Other methods for correlating 

15 spectroscopic signals with bound conformations of a 
ligand are known in the art including, for example, 
correlation of transferred NOE signals with anti and syn 
conformations of the nicotinamide ring in NADPH as 
described in Sem and Kasper Biochemistry 31:3391-3398 

20 (1992) . Correlation of transferred NOE signals with 

conformation is also described in Clore and Gronenborn, 
J, Maan. Reson. 48:402-417 (1982). 

A correlation between a bound conformation and 
a conformation-dependent property can also be established 

25 for a ligand bound to a non-polypeptide binding partner 
because a conformation-dependent property of a ligand can 
be independent of interactions that differ between 
binding partners so long as the ligand is in the same 
bound conformation when bound to the binding partners. 

30 Other binding partners include, for example, nucleic 
acids, carbohydrates, and synthetic organometallic 
complexes . 
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A method of the invention for identifying a 
conformation-dependent property of a ligand can also 
include the steps of (a) determining a bound conformation 
of a ligand, or portion thereof, bound to two or more 
5 polypeptides; (b) identifying two or more bound 

conformations of the ligand, or portion thereof, having 
substantially the same bound conformation, and (c) 
identifying a conformation-dependent property of the 
bound conformations of the ligand, or portion thereof, 

10 having substantially the same bound conformation, the 

conformation-dependent property being correlated with the 
bound conformation of the ligand, or portion thereof. A 
conformation-dependent property of a portion of a ligand 
can be identified, for example, by using the methods 

15 described above for identifying a conformation-dependent 
property of a ligand. 

The invention also provides a method for 
identifying a polypeptide pharmacof amily . The method 
includes the steps of (a) determining bound conformations 

20 of a ligand bound to different polypeptides of a 

polypeptide family, and (b) identifying two or more bound 
conformations of the ligand having substantially 
different bound conformations, thereby identifying at 
least two polypeptide pharmacof amilies exhibiting binding 

25 specificity for the two or more substantially different 
bound conformations of the ligand. 

A method for identifying a polypeptide 
pharmacofamily can include the steps of (a) determining 
bound conformations of a ligand bound to different 
30 polypeptides of a polypeptide family; (b) clustering 

bound conformations of a ligand having substantially the 
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same conformations into pharmacoclusters; and (c) 
identifying a first polypeptide that binds a bound 
conformation of a ligand in one pharmacocluster and a 
second polypeptide that binds a bound conformation of a 
5 ligand in a second pharmacocluster as belonging to 
separate polypeptide pharmacofamilies. 

Polypeptides of a polypeptide family can be 
identified by their ability to specifically bind to the 
same ligand, or portion thereof. Specific binding 

10 between a polypeptide and a ligand can be identified by 
methods known in the art. Methods of determining 
specific binding include, for example, equilibrium 
binding analysis, competition assays, and kinetic assays 
as described in Segel, Enzvme Kinetics John Wiley and 

15 Sons, New York (1975), and Kyte, Mechanism in Protein 

Chemistry Garland Pub. (1995). Thermodynamic and kinetic 
constants can be used to identify and compare 
polypeptides and ligands that specifically bind each 
other and include, for example, dissociation constant 

20 (Kd) , association constant (K^) , Michaelis constant (K„) , 
inhibitor dissociation constant (K^g) association rate 
constant (k^n) or dissociation rate constant (k^ff) - For 
example, a family can be identified as having members 
that can specifically bind a ligand with a of at most 

25 10-' M, 10-* M, 10-5 M, 10-^ M, IQ-'' M, lO'^ M, M, 10-^° M, 

10-^^ M, or IQ-'-^ M or lower. 

A family of polypeptides that bind a ligand can 
contain a pharmacof amily that binds substantially the 
same conformation of the ligand, or portion thereof. The 
30 methods can be used to identify any number of 

pharmacofamilies in a family according to the number of 



53 

different bound conformations of a ligand identified- In 
cases where two or more polypeptide pharmacof amilies 
reside in a polypeptide family, the pharmacof amilies can 
be distinguished according to differences in bound 
5 conformations of a ligand bound to the polypeptides. In 
this case^ a bound conformation of a ligand can be 
determined and compared according to the methods 
described herein. Polypeptides bound to different bound 
conformations of a ligand can be identified as those that 

10 do not show substantial overlap of all corresponding 
atoms when bound conformations are overlaid. Thus, 
polypeptides that bind different bound conformations of a 
ligand can be separated into different pharmacof amilies . 
Pharmacofamilies in turn can be identified as containing 

15 polypeptides that bind substantially the same bound 
conformation of a ligand (see Examples II and III) . 

A pharmacofamily of polypeptides identified by 
the methods of the invention can have additional 
similarities that correlate with similarities in bound 

20 conformation of a ligand. For example, a polypeptide 

pharmacofamily identified by the methods of the invention 
can consist of polypeptide members that share 
characteristics that are unique to the pharmacofamily 
when compared to one or more other polypeptides in a 

25 different pharmacofamily of the same family. Such 

characteristics can include, for example, protein fold, 
evolutionary relatedness, enzymatic activity, domain 
structure, subcellular localization, interaction 
partners, or participation in a similar metabolic or 

30 signal transduction pathway. A demonstration of a 

correlation between ligand bound conformation and another 
characteristic of polypeptides in a pharmacofamily is 
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provided in Example II, which describes correlation of 
bound conformation of a ligand with polypeptide 
structure. 

An example of a polypeptide family having 
5 multiple pharmacof amilies that can be identified by the 
methods of the invention includes NAD(P) (H) binding 
polypeptides- Polypeptide pharmacof amilies identified 
according to differences in bound conformations of 
NAD(P) (H) are described in Example II and Table 11- 
10 Thus, the methods can be used to identify a polypeptide 
pharmacofamily selected from the group consisting of 
pharmacof amily 1, pharmacofamily 2, pharmacofamily 3, 
pharmacofamily 4, pharmacofamily 5, pharmacofamily 6, 
pharmacofamily 7, and pharmacofamily 8. 

15 The invention provides a polypeptide 

pharmacofamily, comprising polypeptides that bind to 
substantially the same bound conformation of a 
nicotinamide adenine dinucleotide-related molecule 
selected from pharmacofamily 1, pharmacofamily 2, 

20 pharmacofamily 3, pharmacofamily 4, pharmacofamily 5, 

pharmacofamily 6, pharmacofamily 7, and pharmacofamily 8 
as listed in Table 11. 

Pharmacofamilies 1 through 8 consist of the 
polypeptide members provided in Table 11 (see Example 

25 II) . The polypeptides in pharmacofamily 1 have the 

NAD(P) (H) binding Rossman fold in common, are all in the 
NAD(P) (H) binding Rossman SCOP Superfamily, and fall into 
the SCOP families of the amino-terminal domain of 
glyceraldehyde-3-phosphate dehydrogenase, the carboxy- 

30 terminal domain of alcohol/glucose dehydrogenase, the NAD 
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binding domain of f ormate/glycerate dehydrogenase^ the 
carboxy-terminal domain of amino acid dehydrogenase, or 
the amino-terminal domain of lactate & malate 
dehydrogenase . 

The polypeptides in pharmacof amily 2 have the 
NAD(P) (H) binding Rossman fold in common, are all in the 
NAD(P) (H) binding Rossman SCOP Superf amily, and fall into 
the SCOP families of the carboxy-terminal domain of amino 
acid dehydrogenase, glyceraldehyde-3-phosphate 
dehydrogenase, and 6-phosphogluconate dehydrogenase. 

The polypeptides in pharmacof amily 3 have the 
NAD(P) (H) binding Rossman fold in common, are all in the 
NAD{P) (H) binding Rossman SCOP Superf amily, and fall into 
the tyrosine-dependent oxidoreductase SCOP family. 

The polypeptides in pharmacof amily 4 have the 
heme-linked catalase fold and are in the heme-linked 
catalase SCOP superfamily and heme-linked catalase SCOP 
family. 

The polypeptides in pharmacof amily 5 have the 
20 p-a TIM barrel fold in common, are all in the NAD(P) (H) 
linked oxidoreductase SCOP Superfamily, and fall into the 
aldo-keto reductase SCOP family. 

The polypeptides in pharmacof amily 6 are 
dihydrofolate reductases that all show the dihydrof olate 
25 reductase fold and fall into the dihydrofolate reductase 
SCOP superfamily and family. 
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The polypeptides in pharmaco family 7 have the 
FAD/NAD (P) (H) binding domain fold in common, are all in 
the FAD/NAD (P) (H) binding domain SCOP Superfamily, and 
fall into the the amino-terminal and central domains of 
5 FAD/NAD linked reductase SCOP family. 

The polypeptides in pharmacof amily 8 have the 
ferrodoxin like fold in common, are all in the ferrodoxin 
like SCOP Superfamily, and fall into the NADPH-cytochrome 
P450 reductase or reductase SCOP families. 

10 Polypeptide pharmacof amilies 1 through 8 were 

identified according to binding interactions with bound 
conformations of NAD(P) (H) in pharmacoclusters 1 through 
8, as described in Example II. Accordingly, the 
invention provides a polypeptide pharmacof amily, 

15 comprising polypeptides that bind to a nicotinamide 
adenine dinucleotide-related molecule having a bound 
conformation selected from pharmacocluster 1, 
pharmacocluster 2, pharmacocluster 3, pharmacocluster 4, 
pharmacocluster 5, pharmacocluster 6, pharmacocluster 7, 

20 and pharmacocluster 8. 

The invention additionally provides a method 
for identifying a member of a polypeptide pharmacof amily . 
The method consists of (a) determining a conformation- 

25 dependent property of a ligand bound to a polypeptide, 
and (b) determining a pharmacocluster having 
substantially the same conformation-dependent property as 
the conformation-dependent property determined for the 
bound ligand, wherein a polypeptide pharmacof amily binds 

30 the ligand in a conformation of the pharmacocluster, 
thereby identifying the polypeptide as a member of the 
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polypeptide pharmacof amily . For example, the method can 
be used with a ligand such as a nicotinamide adenine 
dinucleotide-related molecule or adenosine phosphate- 
related molecule (see Examples II and III) . 

5 The methods of the invention allow a new member 

of a polypeptide pharmacof amily to be identified based on 
correlation of a conformation-dependent property of a 
bound conformation of a ligand bound to a polypeptide 
with a conformation-dependent property established for a 

10 bound conformation of the ligand bound to another 
polypeptide in the same pharmacof amily . Thus, a 
classification can be made based on ligand structure 
without requiring determination of the bound conformation 
of the ligand. In one embodiment, the conf ormation- 

15 dependent property can be a model of a bound 

conformation. A bound conformation of a ligand bound to 
a test polypeptide can be determined, and the bound 
conformation can be compared to a pharmacocluster 
according to the methods described herein. Substantial 

20 overlap between the bound conformation of the ligand 
bound to the test polypeptide and another bound 
conformation of the ligand bound to a polypeptide in a 
pharma CO family can be used to identify the test 
polypeptide as a member of that polypeptide 

25 pharmacof amily. 

In another embodiment, the conformation- 
dependent property can be a spectroscopic signal that is 
correlated with the conformation of a ligand. A 
spectroscopic signal can be measured for the ligand bound 
30 to a test polypeptide. The signal can be compared to a 
signal correlated with a bound conformation of a ligand 
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bound to a polypeptide in a polypeptide pharmacof amily . 
Substantial similarity between the two signals indicates 
that the bound conformation of the ligand bound to the 
test polypeptide is substantially similar to the bound 
5 conformation of the ligand bound to the polypeptides of 
the pharmacof amily. Thus, the test 
polypeptide can be identified as a member of the 
polypeptide pharmacof amily. 

The invention provides rapid and efficient 
10 methods that can be used in a high-throughput screening 
format. High-throughput methods can be useful for 
identifying a member of a polypeptide pharmacof amily . In 
a case where a conformation-dependent property can be 
rapidly detected and processed, automated methods can be 
15 created for measuring samples in rapid succession or 
measuring multiple samples in parallel. Automated 
methods can be used for rapidly handling samples 
including, for example, robotic instruments. A 
combination of automated sample handling methods with 
20 detection of a conformation-dependent property can, 
therefore, be useful in a high-throughput screening 
method. 

According to the methods of the invention a 
compound can be identified that has greater specificity 

25 for the polypeptides of one pharmacof amily than for other 
polypeptides in the same family. Such a compound can be 
used to identify new members of apharmacof amily using a 
binding assay. For example, a mimetic or analog of a 
ligand can be identified that preferentially adopts a 

30 conformation more similar to conformations in a 
particular pharmacocluster than those in other 



pharmacoclusters. Such a mimetic or analog can be used 

in a any binding assay capable of detecting interactions 

with a polypeptide, including, for example, high- 
throughput methods. 

5 A member of a polypeptide pharmacof amily can 

also be identified by searching a database of bound 
conformations of a ligand. For example, a bound 
conformation of a ligand that binds to a polypeptide of 
an identified pharmacof amily can be used as a query in a 

10 3 dimensional search of a database containing bound 
conformations of a ligand. Overlap between the query 
conformation and a retrieved bound conformation of the 
ligand can be used to identify a polypeptide bound to the 
retrieved bound conformation of the ligand as a member of 

15 the same polypeptide pharmacof amily as a polypeptide that 
binds the query bound conformation (see Example I) . 

The invention also provides a method of 
modeling the three dimensional structure of a 
polypeptide. The method consists of (a) determining a 

20 conformation-dependent property of a ligand bound to a 
polypeptide; (b) determining a pharmacocluster having 
substantially the same conformation-dependent property as 
the conformation-dependent property determined for the 
bound ligand, wherein a polypeptide pharmacof amily binds 

25 the ligand in a conformation of the pharmacocluster, 
thereby identifying the polypeptide as a member of the 
polypeptide pharmacof amily, and (c) modeling the three 
dimensional structure of the polypeptide according to a 
structural model of the second member of the polypeptide 

30 pharmacofamily . 
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As disclosed herein, polypeptides in a 
pharmaco family can have similar characteristics 
including, for example, similar 3 dimensional structure. 
Therefore, the 3 dimensional structure of a polypeptide 
5 identified by the invention as a member of a 

pharmacofamily can be modeled using a polypeptide that is 
in the same pharmacofamily and for which the structure is 
known. A variety of methods are known in the art for 
modeling the three dimensional structure of a polypeptide 

10 according to the amino acid sequence of the polypeptide 
and a structure of a second polypeptide used as a 
template. Available algorithms include, for example, 
GRASP (Nicholls, A., supra), ALADDIN (Van Drie et al. 
supra), INSIGHT98 (Molecular Simulations Inc., San Diego 

15 CA) , RASMOL (Sayle et al.. Trends Biochem Sci. 20:374-376 
(1995)) and MOLMOL (Koradi et al,, J. Mol. Graphics 
14:51-55 (1996 ) ) . 

A model of a polypeptide determined by the 
methods of the invention can be useful for identifying a 

20 function of the polypeptide. For example, residues of a 
polypeptide that are involved in binding can be 
identified using a model of the invention. Residues 
identified as participating in binding can be modified, 
for example, to engineer new functions into a 

25 polypeptide, to reduce an intrinsic activity of a 

polypeptide, or to enhance an intrinsic activity of a 
polypeptide. In another example, a model of a 
polypeptide can be compared to other polypeptide 
structures to identify similar functions. Exemplary 

30 functions that can be identified from a polypeptide 
structure include binding interactions with other 
polypeptides and catalytic activities. 
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The invention also provides a method for 
constructing a ligand conformer model by determining an 
average structure of the bound conformations of a ligand 
in a pharmacocluster . A method for constructing a ligand 
5 conformer model can include the steps of (a) determining 
bound conformations of a ligand bound to different 
polypeptides; (b) clustering two or more bound 
conformations of the ligand having substantially the same 
bound conformation, thereby identifying a 

10 pharmacocluster, and (c) determining an average structure 
of the bound conformations of the ligand in the 
pharmacocluster. Additionally, a method for constructing 
a ligand conformer model can include the steps of (a) 
determining a bound conformation of a ligand bound to a 

15 polypeptide; (b) determining a pharmacocluster having 
substantially the same bound conformation as the bound 
conformation, thereby identifying the bound conformation 
of the ligand as a member of the pharmacocluster, and (c) 
determining an average structure of the bound 

20 conformations of the ligand in the pharmacocluster. 

An average structure of the bound conformations 
of a ligand in a pharmacocluster can be determined by a 
variety of methods known in the art. For example, an 
average structure can be determined by overlaying bound 

25 conformations, or portions thereof, and identifying an 

average location for each atom. Bound conformations in a 
group to be averaged can be overlayed relative to a 
single member or relative to a centroid position for each 
atom. Algorithms for determining an average structure 

30 are known in the art and include for example the OVERLAY 
routine in INSIGHT98 (Molecular Simulations Inc., San 
Diego CA) . 
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The format of a ligand conformer model can be 
chosen based on the method used to generate the model and 
the desired use of the model. In this regard, a 
conformer model can be represented as a single structure. 
5 The resulting structure can be a unique structure 

compared to the conformations in the pharmacocluster from 
which it was derived. Thus, the conformer model can be a 
new structure never before observed in nature. A model 
represented by a single structure can be useful for 

10 making visual comparisons by overlaying other structures 
with the model. A conformer model can also be 
represented as a plurality of structures incorporating 
all or a subset of the bound conformations in the 
pharmacocluster. A model represented by multiple 

15 structures can be useful for identifying a range of minor 
deviations in the model. 

In yet another representation, the conformer 
model can be a volume surrounding all or a subset of the 
bound conformations in the pharmacocluster. A model 

20 showing volume can be useful for comparing other 

structures in a fitting format such that a structure 
which fits within the volume of the model can be 
identified as substantially similar to the model. One 
approach that can be used to fit a structure to a volume 

25 is comparison of equivalent surface patches using 

gnomonic projection as described for example in Chau and 
Dean, J. Mol, Graphics 5:97 (1987). Use of a gnomonic 
projection to compare structures is also described in 
Doucet and Weber, Computer-Aided Molecular Design: Theory 

30 and Applications , Academic Press, San Diego CA (1996) . 
Algorithms which can be used to fit a structure to a 
volume are known in the art and include, for example. 
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CATALYST (Molecular Simulations Inc., San Diego, CA) and 
THREEDOM which is a part of the INTERCHEM package which 
makes use of an Icosahedral Matching Algorithm (Bladon, 
J. Mol. Graphics 7:130 (1989) for the comparison and 
5 alignment of structures. An exemplary method of 

identifying a binding compound by searching a database of 
structures using a gnomonic projection is provided in 
Example V. 

10 A conformer model can be useful in querying a 

database of polypeptide structures to find other members 
of a polypeptide pharmacof amily • For example, a member 
of a polypeptide pharmacof amily can be identified by 
querying a database of bound conformations of a ligand to 

15 identify a retrieved bound conformation of a ligand that 
is substantially similar to the query structure, thereby 
identifying a polypeptide bound to the retrieved bound 
conformation as a member of the same pharmacof amily as a 
polypeptide bound to the query bound conformation. A 

20 conformer model can also be used to identify a new member 
of a polypeptide pharmacof amily by querying a database of 
one or more polypeptide structures using an algorithm 
that docks the conformer model, wherein a favorable 
docking result with a retrieved polypeptide indicates 

25 that the retrieved polypeptide is a member of the same 
polypeptide pharmacof amily as a polypeptide bound to the 
bound conformation used as a query. In the latter mode, 
a potential new member of a pharmacof amily from which the 
conformer model was derived can be identified. The 

30 database queries described above can be performed with 
algorithms available in the art including, for example, 
THREEDOM and CATALYST. 
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An advantage of the invention is that a 
conformer model can be used to identify a binding 
compound that is specific for polypeptides of a 
pharmacofamily. For example, the conformer model can be 
5 compared to a structure of a compound or to a bound 

conformation of a ligand to identify those having similar 
conformation. A conformer model can be further used to 
query a database of compounds to identify individual 
compounds having similar conformations. 

10 A conformer model of the invention can also be 

used to design a binding compound that is specific for 
polypeptides of one or more pharmacof amilies . The 
methods of the invention provide a conformer model that 
can be produced according to a cluster of bound 

15 conformations of a ligand that are specific for 

polypeptides of a pharmacofamily- A conformer model 
identified by these criteria can be used as a scaffold 
structure for developing a compound having enhanced 
binding affinity or specificity for polypeptides of a 

20 pharmacofamily. Such a scaffold can also be used to 

design a combinatorial synthesis producing a library of 
compounds which can be screened for enhanced binding 
affinity for polypeptide members of a pharmacofamily or 
specificity for polypeptide members of one pharmacofamily 

25 compared to polypeptide members of another 

pharmacofamily. An algorithm can be used to design a 
binding compound based on a conformer model including, 
for example, LUDI as described by Bohm, J. Comout. Aided 
Mol. Pes. 6:61-78 (1992). 



30 A conformer model need not include all atoms of 

a pharmacocluster . Thus, a conformer model can include a 
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portion of atoms in a pharmacocluster so long as the 
portion consists of contiguous atoms of a bound 
conformation of a ligand and provides sufficient 
information to distinguish one pharmacocluster from 
5 another- Thus, a conformer model can be constructed by 
overlaying corresponding fragments of bound conformations 
of a ligand and obtaining an average structure according 
to the methods described above • A conformer model made 
from a portion of a ligand can be advantageous due to its 

10 small size compared to a complete structure of the ligand 
from which it was derived. A conformer model based on a 
portion of a bound conformation of a ligand can also be 
used to more efficiently and rapidly query a database due 
to a reduced use of computer memory compared to the 

15 memory required to manipulate and store a structure 
containing all atoms of the ligand. 

The invention provides a ligand conformer 
model, selected from the group consisting of conformer 
model 1 having coordinates listed in Table 3C, conformer 

20 model 2 having coordinates listed in Table 4C, conformer 
model 3 having coordinates listed in Table 5C, conformer 
model 4 having coordinates listed in Table 6C, conformer 
model 5 having coordinates listed in Table IC, conformer 
model 6 having coordinates listed in Table 8C, conformer 

25 model 7 having coordinates listed in Table 9C, and 

conformer model 8 having coordinates listed in Table IOC. 
Conformer models 1-8 are average structures calculated 
from pharmacoclusters 1-8 respectively. The conformer 
models were determined as described in Example III and 

30 are shown in Figure 4, 
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The invention also provides moiety, having 
coordinates listed in Table 3C, coordinates listed in 
Table 4C, coordinates listed in Table 5C, coordinates 
listed in Table 6C, coordinates listed in Table 7C, 
5 coordinates listed in Table 8C, coordinates listed in 
Table 9C, or coordinates listed in Table IOC or subsets 
of the respective coordinate sets thereof. In one 
embodiment the moiety is not nicotinamide adenine 
dinucleotide or nicotinamide adenine dinucleotide 
10 phosphate. 

Additionally, the invention provides a method 
for constructing a pharmacophore model by constructing a 
model that contains one or more selected conformation- 
dependent properties of one or more pharmacoclusters . A 

15 method for constructing a pharmacophore model can include 
the steps of (a) determining bound conformations of a 
ligand bound to different polypeptides; (b) identifying 
two or more bound conformations of the ligand having 
substantially the same bound conformation; (c) 

20 identifying a conformation-dependent property of the 
bound conformations of the ligand having substantially 
the same bound conformation, the conformation-dependent 
property being correlated with the bound conformation of 
the ligand, and (d) constructing a model that contains 

25 one or more selected conformation-dependent properties of 
one or more pharmacoclusters. 

Additionally, a method for constructing a 
pharmacophore model can include the steps of (a) 
determining bound conformations of a ligand, or portion 
30 thereof, bound to different polypeptides; (b) clustering 
two or more bound conformations of the ligand, or portion 



67 

thereof, having substantially the same bound 
conformation, thereby identifying a pharmacocluster, and 
(c) determining an average structure of the bound 
conformations of the ligand, or portion thereof, in the 
5 pharmacocluster, wherein the average structure is a 

pharmacophore model. A method for constructing a ligand 
conformer model can also include the steps of (a) 
determining a bound conformation of a ligand, or portion 
thereof, bound to a polypeptide; (b) determining a 

10 pharmacocluster having substantially the same bound 
conformation as the bound conformation, thereby 
identifying the bound conformation of the ligand as a 
member of the pharmacocluster, and (c) determining an 
average structure of the bound conformations of the 

15 ligand in the pharmacocluster, wherein the average 
structure is a pharmacophore model. 

A pharmacophore model constructed by the 
methods of the invention can be derived from any 
conformation-dependent property that is correlated with a 

20 pharmacocluster. An example of a pharmacophore model 
useful in the methods of the invention is a conformer 
model. Additionally, a pharmacophore model can include a 
portion of a bound conformation, wherein the portion need 
not contain contiguous atoms of a bound conformation of a 

25 ligand so long as the pharmacophore model provides 

sufficient information to distinguish one pharmacocluster 
from another. Thus, a pharmacophore model can appear as 
points in space unconnected by any semblance of a 
covalent bond due to absence of intervening atoms. For 

30 example, a pharmacophore model constructed from a 

pharmacocluster of nicotinamide adenine dinucleotide 
bound conformations can contain a phosphate moiety and 
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nicotinamide ring moiety absent the ribose moiety which 
intervenes in a complete model of the structure. 



A pharmacophore model can be any representation 
of points in a defined coordinate system that correspond 
5 to positions of atoms in a bound conformation of a 

ligand. For example, a point in a pharmacophore model 
can correlate with the center of an atom in a conformer 
model. An atom of a conformer model can also be 
represented by a series of points forming a line, plane 
10 or sphere, A line, plane or sphere can form a geometric 
representation designating, for example, shape of one or 
more atoms or volume occupied by one or more atoms. 

A pharmacophore model can be represented in any 
coordinate system including, for example, a 2 dimensional 
15 Cartesian coordinate system or 3 dimensional Cartesian 
coordinate system. Other coordinate systems that can be 
used include a fractional coordinate system or reciprocal 
space such as those used in crystallographic calculations 
which are described in Stout and Jensen, supra. 

20 In addition to a geometric description of a 

bound conformation of a ligand, a pharmacophore model can 
include other characteristics of atoms or moieties of the 
ligand including, for example, charge or hydrophobicity . 
Thus, a pharmacophore model can be a generalized 

25 structure, which includes but does not unambiguously 

describe the bound conformations of the ligand bound to 
the polypeptides in the pharmacof amily from which it was 
derived. For example, atoms can be represented as units 
of charge such that an oxygen in a bound conformation of 

30 a ligand can be represented by an electronegative point 



69 

in the pharmacophore model. In this example, the 
electronegative point in the pharmacophore model includes 
any electronegative atom at that particular location 
including, for example, an oxygen or sulfur. 

5 A pharmacophore model can be constructed to 

include, in addition to characteristics of the ligand 
itself, characteristics of an atom or moiety that 
interacts with the ligand and from a bound polypeptide. 
Characteristics of an interacting polypeptide atom or 

10 moiety that can be included in a pharmacophore model 
include, for example, atomic number, volume occupied, 
distance from an atom of the ligand, charge, 
hydrophobicity, polarity, or location relative to the 
ligand. Methods for constructing a pharmacophore model 

15 to include interacting atoms from a polypeptide are 
provided in Example III. 

A characteristic included in a pharmacophore 
model can be incorporated into a geometric representation 
using any additional representation that can be 

20 correlated with the characteristic. For example, use of 
color or shading can be used to identify regions having 
characteristics such as charge, polarity, or 
hydrophobicity. As such, the depth of shading or color 
or the hue of color can be used to determine the degree 

25 of a characteristic. By way of example, a common 

convention used in the art is to identify regions of 
increased positive charge with deeper shades of blue, 
areas of increased negative charge with deeper shades of 
red and neutral regions with white. Numeric 

30 representations can also be used in a pharmacophore model 
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including, for example, values corresponding to potential 
energy for an interaction, or degree of polarity. 



In addition, a pharmacophore model can 
incorporate constraints of a physical or chemical 
5 property of the bound conformations of a ligand in a 

pharmacocluster- A constraint of a physical property can 
be, for example, a distance between two atoms, allowed 
torsion angle of a bond, or volume of space occupied by 
an atom or moiety. A constraint of a chemical property 
10 can be, for example, polarity, van der Waals interaction, 
hydrogen bond, ionic bond, or hydrophobic interaction. 
Such constraints can be included in a pharmacophore model 
using the representations described above. 

A pharmacophore model can include two or more 
15 pharmacoclusters- In order to identify a ligand having 
broad specificity for two or more polypeptide 
pharmacofamilies, a pharmacophore model can be derived 
from the two or more corresponding pharmacoclusters. 
Additionally, in order to identify a ligand that can 
20 preferentially bind a first polypeptide which belongs to 
a first polypeptide pharmacof amily compared to a second 
polypeptide of a second polypeptide pharmacof amily, a 
pharmacophore model can incorporate constraints on 
geometry or any other characteristic so as to exclude a 

25 characteristic of the bound conformation of the ligand 
bound to the second polypeptide. For example, a 
geometric constraint can be a forbidden region for one or 
more atom of a bound conformation of a ligand. A 
forbidden region can be identified by overlaying two 

30 conformer models in a coordinate system and identifying a 
coordinate or set of coordinates differentially occupied 
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by one or more atoms of the conformer models. A 
pharmacophore model incorporating a forbidden region as 
such will be specific for a polypeptide of one 
pharmacofamily over a polypeptide of a second 
5 pharmacofamily correspondent with the constraint 
incorporated. 

An advantage of the invention is that a 
pharmacophore model can be created based on multiple 

10 structures of the same ligand. In comparison to a 

pharmacophore model derived from a single structure or 
different ligands^ a pharmacophore model derived from 
multiple bound conformations of the same ligand can 
include a greater degree of geometric information. For 

15 example, averaging of multiple bound conformations of the 
same ligand can provide torsion angle constraints that 
are not available from a single structure and not evident 
from comparing different ligands. 

20 The invention further provides a method for 

identifying a binding compound for one or more members of 
a polypeptide pharmacofamily by identifying a compound 
having a selected conformation-dependent property of a 
pharmacocluster . A binding compound can be any molecule 

25 having selected conformation-dependent properties of a 
ligand such that the binding compound can form a complex 
with one or more members of one or more polypeptide 
pharmacofamily. A method for identifying a binding 
compound for one or more members of a polypeptide 

30 pharmacofamily can include the steps of contacting a 
ligand with a polypeptide member of a pharmacofamily; 
identifying a conformation-dependent property associated 
with a bound conformation of the ligand bound to the 
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polypeptide; comparing the conformation-dependent 
property of the bound conformation of the ligand bound to 
the polypeptide with a conformation-dependent property of 
a bound conformation of a ligand bound to another 
5 polypeptide in the same pharmacof amily; and identifying a 
ligand bound to the polypeptide with a conformation- 
dependent property similar to a bound conformation of a 
ligand bound to another polypeptide in the same 
pharmacofamily, thereby identifying a compound that binds 

10 one or more polypeptide members of a pharmacof amily . A 
compound that binds to one or more members of a 
polypeptide pharmacof amily can be identified by 
determining a conformation-dependent property by any of 
the methods described herein. For example, a ligand 

15 conformation or spectroscopic signal can provide a 

conformation-dependent property useful in identifying a 
compound that binds to one or more members of a 
polypeptide pharmacof amily. 

The methods described herein for identifying a 
20 binding compound for one or more members of a polypeptide 
pharmacof amily can readily be adapted to a high 
throughput screening method. For example, methods of 
rapidly detecting a conformation-dependent property in a 
sequence of samples or detecting a conformation-dependent 
25 property in parallel samples can be applied to a high- 
throughput screen. One skilled in the art will know how 
to adapt the methods described here to a high throughput 
screening format using, for example, robotic manipulation 
of samples. 

30 A method for identifying a binding compound for 

one or more members of a polypeptide pharmacof amily can 
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include the steps of determining a bound conformation of 
a ligand bound to a polypeptide member of a polypeptide 
pharmacofamily; comparing the bound conformation of the 
ligand bound to the polypeptide member of the polypeptide 
5 pharmacofamily to a pharmacophore model; and identifying 
the bound conformation of the ligand bound to the 
polypeptide member of the polypeptide pharmacofamily that 
satisfies the constraints of the pharmacophore model as a 
binding compound for one or more members of the 
10 pharmacofamily in which the polypeptide member belongs. 

A pharmacophore model can be useful in querying 
a database of polypeptide structures to find other 
members of a polypeptide pharmacofamily. For example, a 
member of a polypeptide pharmacofamily can be identified 

15 by querying a database of bound conformations of a ligand 
to retrieve a structure that fits the constraints of the 
query pharmacophore model, thereby identifying the 
retrieved polypeptide as a member of the pharmacofamily 
from which the pharmacophore model was derived. A 

20 pharmacophore model can also be used to identify a new 
member of a polypeptide pharmacofamily by querying a 
database of one or more polypeptide structures using an 
algorithm that docks or compares the pharmacophore model 
to polypeptide structures, wherein a favorable docking or 

25 comparison identifies a polypeptide as a member of the 
same polypeptide pharmacofamily from which the 
pharmacophore model was derived. The database queries 
described above can be performed with algorithms 
available in the art including, for example, THREEDOM and 

30 CATALYST. 
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An advantage of the invention is that a 
pharmacophore model can also be used to identify a 
binding compound that is specific for polypeptides of one 
or more pharmacofamilies. For example^ a pharmacophore 
5 model can be compared to a structure of a compound or to 
a bound conformation of a ligand to identify those having 
similar properties. A conformer model can be further 
used to query a database of compounds to identify 
individual compounds having similar properties. 

10 A pharmacophore model of the invention can also 

be used to design a binding compound that is specific for 
polypeptides of one or more pharmacofamilies. A 
pharmacophore model identified by these criteria can be 
used as a scaffold or set of constraints for developing a 

15 compound having enhanced binding affinity or specificity 
for polypeptides of one or more pharmacofamilies. Using 
similar methods a pharmacophore model can be used to 
design a combinatorial synthesis producing a library of 
compounds having properties consistent or similar to the 

20 model which can be then be screened for enhanced binding 
affinity or specificity for polypeptide members of one or 
more pharmacofamilies. An algorithm can be used to 
design a binding compound based on a pharmacophore model 
including, for example, LUDI as described by Bohm, 

25 Comput. Aided Mol. Pes. 6:61-7 8 (1992). 

A compound can be identified as satisfying the 
constraints of a pharmacophore model by a variety of 
methods for comparing structures. For example, a 
pharmacophore model that is a geometric representation 
30 such as a conformer model can be overlaid with a 
compound, and the best fit determined as described 
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herein. Substantial overlap between a compound and a 
pharmacophore model can be indicated by a visual 
comparison and/or computation based comparison based on 
for example, RMSD values or torsion angle values as 
5 described above. In a case where a pharmacophore model 
is represented by constraints, a compound can be fitted 
to the pharmacophore model to identify if the properties 
of the compound satisfy the constraints of the 
pharmacophore model. For example, if a pharmacophore 

10 model contains, as a constraint, a maximum distance 

between atoms, a compound that satisfies the constraint 
can be identified as having a bond distance between 
corresponding atoms that is at least the maximum value. 
One skilled in the art will know how to extend such 

15 methods of comparison to any physical or chemical 
constraint . 

A compound can also be identified as satisfying 
the constraints of a pharmacophore model by demonstrating 
the same characteristics for one or more specific atom 

20 located within a volume of space defined by the geometric 
constraints of the pharmacophore model. For example, in 
a case where polarity is a constraint and where a 
conformation of a compound can be overlaid with a 
pharmacophore model, an atom that overlaps a volume of 

25 space indicated by the pharmacophore and having polarity 
within the defined limits can be identified as satisfying 
constraints of the pharmacophore. By extension, a 
compound having atoms which satisfy all constraints of a 
pharmacophore is identified as a binding compound for one 

30 or more members of a polypeptide pharmacof amily from 
which the pharmacophore was produced. 
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Therefore, the invention provides a binding 
compound identified by the above described methods. For 
example, the invention provides a binding compound 
identified using a pharmacophore model or a conformer 
5 model derived from a pharmacocluster and/or 
pharmacof amily . 

The invention provides a pharmacophore model, 
selected from the group consisting of pharmacophore model 
1 having coordinates listed in Tables 3B and 3C, 

10 pharmacophore model 2 having coordinates listed in Tables 
4B and 4C, pharmacophore model 3 having coordinates 
listed in Tables 5B and 5C, pharmacophore model 4 having 
coordinates listed in Tables 6B and 6C, pharmacophore 
model 5 having coordinates listed in Tables 7B and 7C, 

15 pharmacophore model 6 having coordinates listed in Tables 
8B and 8C, pharmacophore model 7 having coordinates 
listed in Tables 9B and 9C, and pharmacophore model 8 
having coordinates listed in Tables lOB and IOC. 

The invention also provides a medium comprising 
20 a storage medium and stored in the medium, atom 

coordinates selected from the atomic coordinates listed 
in Table 3B, 3C, 4B, 4C, 5B, 5C, 6B, 6C, 7B, 7C, 8B, 8C, 
9B, 9C, lOB or IOC, or a subset thereof. In one 
embodiment the medium comprises a computer readable 
25 medium. The use of a computer apparatus is convenient 
since atomic coordinates can be conveniently stored and 
accessed for manipulation including, for example, docking 
to a polypeptide structure or comparison to coordinates 
for other bound conformations of a ligand. Exemplary 
30 methods for manipulating atomic coordinates are described 
above . 
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It is understood that a computer apparatus of 
the invention need not itself store atomic coordinates of 
the invention. The computer apparatus contains an 
algorithm for viewing a structure from the coordinates or 
5 otherwise manipulating the coordinates. By using various 
hardware, software and network combinations, the atomic 
coordinates can be manipulated in a variety of 
configurations. Such a separate medium can be another 
computer apparatus, a storage medium such as a floppy 

10 disk. Zip disk or a server such as a file-server, which 
can be accessed by a carrier wave such as an 
electromagnetic carrier wave. One skilled in the art 
will know or can readily determine appropriate hardware, 
software or network interfaces that allow interconnection 

15 of an invention computer apparatus. 

The methods of the invention described herein 
can be performed in a computer apparatus using the atomic 
coordinates listed in Table 3B, 3C, 4B, 4C, 5B, 5C, 6B, 
6C, 7B, 7C, 8B, 8C, 9B, 9C, lOB or IOC by adding the step 

20 of entering the coordinates or a subset of the 

coordinates to the computer apparatus that performs a 
method of the invention. One skilled in the art will 
know or can readily determine an algorithm instructing a 
computer apparatus to carry out the methods of the 

25 invention. 

The invention provides a method for identifying 
a polypeptide that binds a ligand. The method includes 
the steps of (a) comparing a sequence of a polypeptide to 
a sequence model for polypeptides that bind a ligand; and 
30 (b) determining a relationship between the sequence and 
the sequence model, wherein a correspondence between the 
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sequence and the sequence model identifies the 
polypeptide as a polypeptide that binds the ligand. 

A method for identifying a polypeptide that 
binds a ligand can include the steps of (a) comparing a 
sequence of a polypeptide to a sequence model for 
polypeptides that bind a ligand, wherein the sequence 
model comprises representations of amino acids consisting 
of a subset of amino acids, the subset of amino acids 
having one or more atom within a selected distance from a 
bound ligand in the polypeptides that bind the ligand; 
and (b) determining a relationship between the sequence 
and the sequence model, wherein a correspondence between 
the sequence and the sequence model identifies the 
polypeptide as a polypeptide that binds the ligand. 

The invention also provides a method for 
identifying a member of a pharmacof amily . The method 
includes the steps of (a) comparing a sequence of a 
polypeptide to a sequence model for polypeptides of a 
pharmaco family; and (b) determining a relationship 
between the sequence and the sequence model, wherein a 
correspondence between the sequence and the sequence 
model identifies the polypeptide as a member of the 
pharmacof amily . 

According to the methods of the invention, a 
sequence can be identified as being similar to 
polypeptides in a set of polypeptides. A polypeptide set 
can be represented by a sequence model identifying 
similarity between the sequences of the polypeptides in 
the set. A sequence model provides a mathematical 
representation of a linear sequence of symbols including. 
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for example, symbols representing amino acids or gaps in 
a polypeptide sequence. A sequence model provides 
relative probabilities for each amino acid type occurring 
at each position in a polypeptide sequence. Model 
5 parameters can be set based on the frequency of amino 
acids at each position in a set of polypeptide sequences 
or other factors including, for example, naturally 
occurring distributions such as with Dirichlet mixture in 
a Hidden Markov Model as described in Durbin et al., 
10 supra. Thus, a sequence model can provide a statistical 
model to which new sequences can be compared to determine 
if the new sequence is similar to polypeptides in the set 
from which the model was generated. 

Sequence models and methods for making and 
using sequence models are well known in the art as 
described for example in Durbin et al., supra. Several 
types of sequence models can be used in the methods of 
the invention including, for example, Hidden Markov 
Models (HMM) which have been described, for example, in 
Eddy, Bioinformatics 14:775-63 (1998), Position Specific 
Score Matrices (PSSM) which have been described, for 
example, in Gribskov et al., Proc. Natl. Acad. Sci. USA . 
84:4355-58 (1987), Support Vector Machines (SVM) which 
have been described, for example, in Jaakkola et al., 
Computational Biology 7:95-114 (1999), or Neural Networks 
as described, for example, in Baldi and Brunak 
Bioinformatics: The Machine Learning Approach MIT Press, 
Cambridge, MA (1998) . 

A sequence model can be produced from a variety 
30 of polypeptide sets containing polypeptides with similar 
sequences. A polypeptide set used to produce a sequence 
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model can be referred to as a training set and the 
resultant sequence model can be referred to as trained by 
the polypeptide set. A sequence model provides a 
statistical description of the occurrence of specific 
5 amino acids at specified positions in a training set of 
polypeptides. An advantage of a sequence model is that 
it can be produced in cases where an alignment has not 
been produced or to identify similarities not evident in 
a traditional pairwise alignment such as BLAST (Altschul 
10 et al., J- Mol. Biol. 215:403-410 (1990)) or FASTA 

(Pearson and Lipman, Proc Natl. Acad. Sci. USA 85:2444- 
2448 (1998) . 

A sequence model can be produced using full 
sequences of polypeptides or portions of a polypeptide 

15 sequence. A portion of a polypeptide useful in making a 
sequence model of the invention can include, for example, 
a region of sequence identified by structural criteria 
such as correlation with a domain or polypeptide fold or 
functional criteria such as correlation with a binding 

20 activity, enzymatic activity or other biological 

activity. A portion of a polypeptide useful in producing 
a sequence model can also include positions of amino 
acids that are not contiguous in the polypeptide from 
which they are derived. For example, a subset of amino 

25 acids can be identified according to structural criteria 
such as proximity in the three dimensional structure or 
functional criteria such as participation in a binding 
activity, enzymatic activity or other biological activity 
of a polypeptide. 



30 Therefore, a sequence model of the invention 

can contain representations of amino acids consisting of 
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a subset of amino acids, the subset of amino acids having 
one or more atom within a selected distance from a bound 
ligand in a set of polypeptides. A sequence model of the 
invention can be produced by the steps of: (a) 
5 identifying a subset of amino acids having one or more 
atom within a selected distance from a bound conformation 
of a ligand in a set of polypeptides that bind the 
ligand; and (b) producing a sequence model, amino acids 
of the sequence model consisting of the subset of amino 
10 acids. 

In addition, a sequence model of the invention 
can contain representations of amino acids consisting of 
a subset of amino acids, the subset of amino acids having 
one or more atom within a selected distance from a bound 

15 ligand in the polypeptides of the pharmacofamily . A 

sequence model of the invention can be produced by the 
steps of: (a) identifying a subset of amino acids in a 
pharmacofamily having one or more atom within a selected 
distance from a bound conformation of a ligand; and (b) 

20 producing a sequence model, amino acids of the sequence 
model consisting of the subset of amino acids. Exemplary 
methods for making a sequence model based on either full 
sequences of polypeptides in a set of polypeptides or 
based on a subset of positions in the sequences of 

25 polypeptides in a set of polypeptides are provided in 
Examples VII, VIII and IX. 

Comparison of a polypeptide sequence to 
sequences in a set of polypeptide sequences can be 
conveniently carried out by comparing the polypeptide 
30 sequence to a sequence model produced from the 

polypeptide sequences in the set. Such a comparison can 
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indicate the likelihood that the sequence is accurately 
represented by the model, or that the sequence is a 
member of the set of polypeptides used to create the 
sequence model. A polypeptide with a high probability of 
5 being similar to a sequence model can be identified as 
having a high probability of being a member of a set of 
polypeptides from which the sequence model was derived. 
For example, a sequence model can be produced based on 
the polypeptides in a pharmacof amily and this sequence 
10 model can be used to search a database for new members of 
the respective pharmacof amily . Exemplary methods for 
producing a sequence model and using the model to 
identify new members of a pharmacof amily are described in 
Examples VII, VIII and IX. 

15 A probability that a polypeptide sequence has a 

correspondence with a sequence model can be determined 
from a probability score. For example, HMMER, which is 
described in Examples VII to IX, can be used to compare 
one or more sequences to a Hidden Markov Model. HMMER 

20 indicates the probability that a given sequence belongs 
to a pharmacofamily used to produce a Hidden Markov Model 
by reporting an E value for each sequence compared. 
Lower E values resulting from comparison of a sequence to 
a sequence model correspond to a stronger probability 

25 that the compared sequence belongs to a pharmacofamily 
used to produce the sequence model. Therefore, an E 
value can be used to determine whether a similarity 
between a sequence and sequence model is statistically 
relevant . 



30 A statistically relevant similarity can be 

identified as having an E value less than a desired 
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cutoff value. An E value below 1 can be considered to 
indicate a correspondence, or a high probability of 
correspondence. Increasing the E value cutoff will 
include a larger number of sequences as corresponding to 
the sequence model. Thus, a larger E value cutoff can be 
used in cases where it is desired to minimize the number 
of members of the pharmacof amily that are missed. More 
specifically, increasing the E value will increase the 
percentage of true positives identified. Increasing the 
number of true positives identified can be achieved by 
increasing the E value cutoff, for example, to 2, 5, 10, 
50 or 100 or higher. An increased E value will also 
increase the percentage of false positives identified. 
In cases where it is desired to minimize incorrectly 
identified sequences, the E value cutoff can be 
decreased, for example, to 0.5, 0.2, 0.1 or 0.01 or 
lower. Thus, one skilled in the art can determine an 
appropriate E value based on the desired or tolerable 
numbers of true and false positives identified. 

20 An E value cutoff can also be made according to 

the shape of a curve in a plot of -ln(E) versus L, where 
L is the location of compared sequences in a list ranked 
by descending E value. For example, an E value cutoff 
can be identified as a significant inflection in the 

25 curve. An inflection point is that point where the 

second derivative of -In (E) with respect to L is zero. 
An inflection in the curve that identifies an appropriate 
E value cutoff can be identified by its magnitude and/or 
position relative to a specified E value. For example, 

30 an E value cutoff for determining statistically relevant 
similarity can be at a statistically significant 
inflection point before a specified threshold value of E 
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is reached in a plot of -ln(E) versus L, or at the last 
inflection point before a specified threshold value of E 
in such a plot. A statistically significant inflection 
point can be identified as having a -ln(E) before the 
5 inflection point that differs from ~ln(E) after the 

inflection point by at least 50. Smaller differences in 
-ln{E) at the inflection point including, for example, at 
least 10, at least 5, at least 2, at least 1.5 or at 
least 1 or lower can identify a cutoff for statistically 

10 relevant similarity, for example, when longer sequence 
subsets are used or when sequence models are compared to 
relatively long sequences. In addition, a cutoff for 
statistically relevant similarity can be indicated by a 
larger difference in -ln(E) value at the inflection 

15 including, for example 100, or 500 or higher, for 

example, when shorter sequence subsets are used or when 
sequence models are compared to relatively short 
sequences. Examples of determining E value cutoffs 
according to the shape of a plot of -ln(E) versus L are 

20 provided in Examples VII and VIII. 

A member of a pharmaco family can also be 
identified by determining relative E values from the set 
of E values determined for sequences identified in a 
search of a database using a sequence model. As 

25 demonstrated in Example X, a relative E value can be a 
cross correlation value (XCorr) which is calculated as 
follows: an E value is determined for a particular 
sequence based on a search of a database using a sequence 
model, the natural log of this E value is calculated (- 

30 ln(E)), and XCorr is calculated as the ratio of the - 
ln(E) for the particular sequence to the summed -ln(E) 
for all pharmacofamilies. Differences in XCorr values for 
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candidate sequences identified in a sequence search can 
be used to identify members that are included and 
excluded from a particular pharmacof amily . As 
demonstrated in Example IX, a plot of XCorr values vs. L 
5 can be particularly useful in identifying members of a 
pharmacofamily in cases where the magnitude of the drop 
position between members and nonmembers in a plot of - 
ln(E) vs. L is relatively small. 

In general^ sequence members of a 
10 pharmacofamily can be identified as having an XCorr value 
larger than about 0,5. XCorr values larger than 0.5 such 
as 0.6, 0.7, 0.8, 0.9 or 1 indicate that the probability 
that the sequence belongs to the specified pharmacofamily 
is much higher than the probability that it belongs to a 
15 different pharmacofamily. Sequences with an XCorr value 
close to zero for a given pharmacofamily have a greater 
probability of belonging to another pharmacofamily. 

The methods of parsing protein sequences into 
pharmacofamilies described herein are useful for 

20 identifying structurally related proteins such as 

proteins having structurally related binding sites. The 
methods for identifying pharmacofamilies and members 
thereof can be used in combination with gene family based 
drug discovery methods, such as those described in W0~ 

25 09960404 (1999, Triad Therapeutics Inc (Sem DS) : Multi- 
partite ligands and methods of identifying and using 
same), to find inhibitors having nanomolar affinity for 
members of one or more pharmacofamily. Using such 
methods focused chemical libraries of potential 

30 inhibitors can be designed and synthesized, or otherwise 
identified and obtained based on the common structural 
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properties of the binding sites of protein members of a 
particular pharmacof amily . These focused libraries can 
be screened to identify inhibitors having high affinity 
for members of a particular pharmacof amily. The 
inhibitors can be further screened for specificity toward 
members of one pharmacof amily compared to members of 
other pharmacof amilies within the same gene family. 
Thus, methods of assigning a protein to a pharmacof amily 
based on amino acid sequence alone, such as those 
described in Example X and employed by the Gene Family 
Profiler program described therein, can increase the 
efficiency at which high affinity inhibitors are 
identified. 

One skilled in the art will be able to identify 
a statistically relevant similarity between an identified 
sequence and a sequence model based on any known method 
of statistical analysis including, for example, those 
that use scores other than E values. Based on the 
description herein, which has been exemplified with E 
scores, one skilled in the art will be able to adapt a 
variety of statistical analysis methods to the methods of 
the invention. 

The methods of the invention can be performed 
in an iterative fashion where E value cut offs are 
adjusted until a desired set of sequences are identified. 
A desired set can be, for example, a validation set as 
described in Examples VII and VIII. A validation set is 
understood to be a collection of polypeptides including 
all known members of a group of polypeptides such as a 
pharmacof amily. 
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Iterations in the methods of the invention can 
also include modifying the training set based on newly 
identified members of a set of polypeptides to improve 
the sequence model. Thus, the methods of the invention 
5 can include the steps of (a) comparing a sequence of a 
polypeptide to a sequence model for polypeptides that 
bind a ligand; (b) determining a relationship between the 
sequence and the sequence model, wherein a correspondence 
between the sequence and the sequence model identifies 

10 the polypeptide as a polypeptide that binds the ligand; 
(c) producing a sequence model with a set of sequences, 
the set of sequences consisting of sequences of 
polypeptides having a subset of amino acids ^ the subset 
of amino acids having one or more atom within a selected 

15 distance from a bound ligand in said polypeptides that 
bind said ligand; (d) adding the sequence of the 
identified polypeptide that binds the ligand to the set 
of sequences; and (e) repeating steps (a) through (c) one 
or more times. In addition steps (a) through (d) can be 

20 repeated multiply to iteratively improve the sequence 

model. For example, the method can be repeated 2 or more 
times, 3 or more times, 5 or more times, or 10 or more 
times . 

The method can also be iterated according to 
25 the following steps (a) comparing a sequence of a 

polypeptide to a sequence model for polypeptides of a 
pharmacofamily; (b) determining a relationship between 
the sequence and the sequence model, wherein a 
correspondence between the sequence and the sequence 
30 model identifies the polypeptide as a member of the 

pharmacofamily; (c) producing a sequence model with a set 
of sequences, the set of sequences consisting of 
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sequences of polypeptides in the pharmacof amily; (d) 
adding a sequence of the identified member of the 
pharmacof amily to the set of sequences; and (e) repeating 
steps (a) through (c) one or more times. 

An ideal sequence comparison method would find 
all true positives and no false positives. In practice, 
a trade-off between these two goals is often required. A 
search can be either sensitive enough to find all true 

positives, but find false positives as well, or selective 
enough to find no false positives, but then miss some of 
the true positives. The method of differential filtering 
can be used to minimize this trade-off as described 
below - 

The invention also provides a method for 
identifying a member of a pharmacof amily, wherein the 
method includes the steps of (a) comparing a sequence of 
a polypeptide to a sequence model and a differential 
sequence model; and (b) determining a relationship 
between the sequence and the sequence models, wherein a 
correspondence between the sequence and the sequence 
models identifies the polypeptide as a member of the 
pharmacof amily . The method can further include the 
following steps: (c) producing a sequence model with a 
set of sequences, the set of sequences consisting of 
sequences of polypeptides in the pharmacof amily; (d) 
adding a sequence of the identified member of the 
pharmacof amily to the set of sequences; and (e) repeating 
steps (a) through (c) one or more times. In addition 
steps (a) through (d) can be repeated multiply to 
iteratively improve the sequence model- For example, the 
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method can be repeated 2 or more times, 3 or more times, 
5 or more times, or 10 or more times, 

The discriminative ability of a sequence model 
to identify members of a set of polypeptides can be 
5 augmented by creating multiple models having differential 
discriminative modes. Differential sequence models can 
represent, or emphasize, different aspects of a set of 
polypeptides. For example, a first model representing a 
structural alignment of polypeptides in a pharmacof amily 

10 can represent different aspects of the pharmacof amily 
members than a second, differential model emphasizing a 
binding site region of the same polypeptides. 
Sequentially filtering the identified sequences from one 
sequence model with a second differential sequence model 

15 screen reduces the rate of false positives overall. This 
is demonstrated in Example VII where it is shown that 
differential filtering can provide a decrease in the 
number of falsely identified sequences while minimizing 
the decrease in the percentage of correctly identified 

20 sequences. 

Different types of sequence models can be used 
to compare sequences by differential filtering. For 
example, the identified sequences from a database search 
with a Hidden Markov model can be sequentially filtered 

25 with a Neural Network model. Furthermore, differential 
filtering can be performed with a combination of 
different amino acid training sets and different types of 
sequence models. For example, the identified sequences 
from a database search with a Hidden Markov model trained 

30 with all of the amino acid positions present in a 

structural model of a polypeptide can be filtered with a 
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Neural Network model trained with a subset of amino acid 
positions including those residues that are proximal to a 
bound ligand. Although the above examples describe 
differential filtering in a sequential mode, it is 
5 understood that differential sequence models can also be 
compared to one or more sequence in a parallel mode and 
the results compared to identify sequences similar to 
polypeptides in a set such as a pharmacof amily . 

A determination as to whether differential 
10 filtering should be used can be made from the shape of a 
plot of -ln(E) versus L produced as described above. If 
there is a sharp drop in E value, a large second 
derivative, and all the known members among the 
identified sequences occur at lower E value compared to 
15 the location of the drop, then one model can be adequate. 
However, if the curve does not have significant 
inflections or known members occur at higher scores than 
a significant inflection, then a clear E value cutoff can 
be difficult to determine. In such cases, choosing a 
20 liberal E value cutoff, sufficient to include all true 
positives, and applying differential filtering to the 
resulting subset of sequences, can be used to decrease 
the number of false positives while minimizing a decrease 
in the number of true positives. 

25 When multiple sequence models are used, it can 

be advantageous to increase the E value cutoff for 
sequence models based on short sequences or small amino 
acid position sets, as shorter sequences tend to produce 
larger E values. An appropriate cutoff to use can be 

30 determined from test runs on a validation set of known 
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matches and mismatches, such as described in Examples VII 
and VIII. 

Validation of a sequence model can also be 
accomplished using only part of the known members of a 
5 pharmacofamily to produce, or train, a sequence model and 
the ability of the model to find members in a database 
can be tested. In such a case the members in the 
database that were left out of the training set will be 
scored lower (higher E value) than those included in the 

10 training set. The score of the omitted sequences can 

indicate a relative upper limit (smallest E value) of an 
appropriate cutoff when a model trained with all known 
members is used to search for new and/or unknown members. 
A sequence which scores in the same region as the omitted 

15 known members has a significant probability of being a 
member whatever the E value. 

The methods of the invention can also be used 
to distinguish to which set of polypeptides an identified 
polypeptide belongs. For example, the methods can be 

20 used to determine which pharmacofamily a polypeptide 

belongs. As described above a number of pharmacof amilies 
can be identified within a family of polypeptides. A 
sequence of a polypeptide member of a family can be 
compared to sequence models derived from each 

25 pharmacofamily within the family of polypeptides. Based 
on probability scores for the relationship of the 
polypeptide sequence to each sequence model, the 
pharmacof amilies to which the sequence is most likely to 
belong can be determined. Specifically, the sequence 

30 would have the highest probability of belonging to the 
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pharmacofamily used to derive the sequence model for 
which the most favorable probability score resulted. 

The probability that a sequence belongs to, or 
is accurately modeled by, a particular sequence model can 
5 easily be determined, for example, by comparison of 
probability scores such as E values. A matrix of 
probability scores for all known members of a polypeptide 
family with each pharmacofamily sequence model can be 
used to expose any gaps in the coverage of the family by 

10 the pharmacofamily sequence models. The gaps can be 

correlated to outlying sequences that were not adequately 
modeled by any of the pharmacofamily sequence models. 
The number of such gaps indicates the degree to which the 
collection of pharmacofamily sequence models form a basis 

15 set that spans the sequence space of the polypeptide 
family. 

Based on the conformations of a ligand 
identified from pharmacoclusters associated with each 
pharmacofamily a binding compound can be identified or 

20 designed as described herein previously. Thus, a 

polypeptide sequence can be identified and compared to a 
set of pharmacofamilies in a family of polypeptides to 
predict or determine specificity toward individual 
binding compounds based on conformation. Similar methods 

25 of determining the probability that any sequence belongs 
to a pharmacofamily can be used to extend a 
pharmacofamily sequence model through a proteome such 
that members of a given pharmacofamily can be identified 
in the proteome, for example, as described in Example IX. 
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Although the above description has been made 
with reference to polypeptide sequences as examples, one 
skilled in the art will know that similar methods can be 
applied to sequence models derived from polynucleotide 
5 sequences. 

It is understood that modifications which do 
not substantially affect the activity of the various 
embodiments of this invention are also provided within 
the definition of the invention provided herein. 
10 Accordingly, the following examples are intended to 
illustrate but not limit the present invention. 

EXAMPLE I 

Identification of Polj^eptide Pharmacofamilies Based on 
Bound Confomations of NAD{P) (H) Ligands 

This example describes identification of ligand 
conformer groups and corresponding polypeptide 
pharmacofamilies based on bound conformations of 
NAD(P) (H) bound to polypeptide oxidoreductases . 

The oxidoreductases form a family of 
20 polypeptides that bind NAD(H) and NADP(H). In order to 
identify pharmacofamilies within the family of 
oxidoreductases, bound conformations of NAD(P) (H) were 
determined by searching the protein databank. Bound 
conformations from 156 structures were clustered into 
25 separate pharmacoclusters, and pharmacofamilies were 

identified according to binding to bound conformations of 
NAD(P) (H) in separate pharmacoclusters. 
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Structure files containing polypeptides with 
bound NAD(P) (H) were identified from the protein databank 
by keyword searches using the database software- 
Keywords included ^^NAD," ^^NADH," ^^NADP," ^^NADPH, " 
5 ''oxidoreductase/' "'dehydrogenase" and ''reductase.'' 
Cluster analysis was performed using the algorithm 
COMPARE (Chiron Corp, 1995; distributed by Quantum 
Chemistry program Exchange, Indianapolis IN) in 
combination with visual inspection. All clusters were 

10 visually inspected using Insight 98 for outliers that 
demonstrated poor overlay with the rest of the 
pharmacocluster as a whole. These outliers were compared 
against each other and existing pharmacoclusters to find 
other possible matches. Those that did not fit any 

15 family were removed. Comparison between bound 

conformations was made based on the RMSD equations 
supplied in COMPARE. 

Eight pharmacoclusters were identified by this 
method, as shown in Figure 1. Visual inspection of the 

20 clusters in Figure 1 demonstrates that members within a 
cluster are substantially overlapped. Comparison between 
clusters demonstrates substantial differences. For 
example, the bound conformations in cluster 5 have an 
extended structure compared to the bound conformations in 

25 cluster 4, which form a horseshoe like shape. Other 
differences include, for example, a flip in the 
nicotinamide ring between cluster 1 and cluster 2 such 
that the nicotinamide ring is anti to the ribose in 
cluster 1 and syn to the ribose in cluster 2 and a change 

30 in torsion angle in the bonds connecting the adenine 
ribose to the adenine phosphate for the bound 
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conformations of cluster 3 compared to those of cluster 
2. 

The dihedral angles for various bonds in the 
bound conformations of the NADP(H) ligand can be used to 
5 distinguish the pharmacoclusters . As shown in Table 1 
(see Figure 2 for atom and bond locations), although many 
dihedral angles are similar between two or more 
pharmacoclusters, each pharmacocluster can be 
distinguished from the others by comparison of the full 

10 set of dihedral angles. For example, pharmacoclusters 2 
and 3 can be distinguished by comparison between the 
dihedral angles at 04 ' A-C4 ' A-C5 ' A-05 ' A which are 154 
degrees and -131 degrees respectively and by comparison 
between the dihedral angles at C5 ' A-05 ' A-PA-03 which are 

15 105 degrees and 57 degrees respectively. 
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A quantitative analysis of the results of 
clustering bound conformations of NAD{P) (H) is provided 
in Table 2. Table 2 shows RMSD values calculated from 
comparisons between each pharmacocluster' s average 
coordinates. Average coordinates were determined from 
the pharmacocluster subsets listed in Tables 3 through 10 
as described below • 
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Tables 3A, 4A, 5A, 6A, 7A, 8A, 9A and lOA show 
RMSD values for subsets of members of pharmacoclusters 1- 
8, respectively. The RMSD values for each member were 
calculated as comparisons to an average structure for the 
5 subsets shown in each table respectively. For each 
pharmacocluster a subset of the possible ligands that 
belong to each cluster were identified. Each subset was 
chosen to maximize the diversity of the family and to 
minimize over-representation of ligand conformations from 

10 enzymes that exist multiply in the PDB database. The 
goal of the subset selection was to fully represent 
characteristics from oxidoreductases belonging to a range 
of species and catalyzing a range of different reactions. 
For example, there exists over ten alcohol dehydrogenases 

15 in the PDB database; however, for purposes of this study, 
only three were chosen from three different species for 
use in the 3D overlay and the pharmacophore construction. 
Average coordinates for the above described 
pharmacocluster subsets were obtained by overlaying 

20 ligand structures in MSI Insightll using the overlay 
function. The three dimensional coordinates for each 
atom in each ligand were used to calculate an average 
position and a standard deviation for the pharmacof amily . 

Comparison of the RMSD values in part A of 
25 Tables 3 through 10 with the RMSD values in Table 2 
demonstrate that a member of a pharmacocluster can be 
identified as having a lower RMSD compared to an average 
conformation of the members in its pharmacocluster than 
the RMSD between each family's average coordinates. In 
30 some cases it can be beneficial to combine two or more 
methods of comparison. For example, as described above 
pharmacoclusters 2 and 3 which have a relatively low RMSD 
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when compared to each other can be distinguished from 
each other by visual inspection and by comparison of 
dihedral angles at various bonds. 

These results demonstrate that bound 
conformations of a ligand can be grouped into 
pharmacoclusters by methods of structure comparison. 
These results also demonstrate methods for distinguishing 
pharmacoclusters and members within pharmacoclusters. 

Example II 

Correlation Between the Structure of Polypeptides and the 
Bound Conformations of NAD (P) (H) 

This example describes a correlation between 
bound conformations of NAD(P) (H) and structural 
classification of polypeptides such that polypeptides of 
a pharmacofamily have similar protein fold. 

Pharmacoclusters for conformations of NAD(P) (H) 
bound to oxidoreductase polypeptides were clustered as 
described in Example I. For each polypeptide the protein 
fold, SCOP super-family designation and SCOP family 
designation was identified from the SCOP website 
administered by Laboratory of Molecular Biology at the 
MRC, Cambridge England (http://mrc-lmb.cam.ac.uk). 

Table 11 shows the grouping of NAD(P) (H) 
binding polypeptides into 8 pharmacof amilies . 
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SCOP -Family 


Alcohol /glucose 
dehydrog , 


Alcohol /glucose 
dehydrog . 


Alcohol /glucose 
dehydrog . 


Alcohol /glucose 
dehydrog . 


Alcohol /glucose 
dehydrog • 


Alcohol /glucose 
dehydrog • 


Alcohol /glucose 
dehydrog . 


Alcohol /glucose 
dehydrog . 


Alcohol/glucose 
dehydrog . 


Alcohol /glucose 
dehydrog . 






SCOP-Superfamily 


NAD(P) binding 
Rossman 


NAD(P) binding 
Rossman 


NAD (P) binding 
Rossman 


NAD(P) binding 
Rossman 


NAD(P) binding 
Rossman 


NAD(P) binding 
Rossman 


NAD{P) binding 
Rossman 


NAD(P) binding 
Rossman 


NAD(P) binding 
Rossman 


NAD(P) binding 
Rossman 




Family 1: NAD(P) Rossman Binding Domain (anti) 


Fold 


NAD(P) binding 
Rossman 


NAD(P) binding 
Rossman 


NAD(P) binding 
Rossman 


NAD(P) binding 
Rossman 


NAD (P) binding 
Rossman 


NAD(P) binding 
Rossman 


NAD (P) binding 
Rossman 


NAD(P) binding 
Rossman 


NAD(P) binding 
Rossman 


NAD{P) binding 
Rossman 




PDB 


H 

m 

H 


lagn 


Idlt 


laxe 


laxg 


Icdo 


Ideh 


Idls 


Ihdx 


Ihdy 


Table 11: Pharmacof amilies 


Source 


Horse 
Liver 


human 


Human 


Horse 
Liver 


Horse 
Liver 


cod fish 


Horse 
Liver 


Human 


human 


human 


Polypeptide 


Alcohol Dehydrogenase 


Alcohol Dehydrogenase 


Alcohol Dehydrogenase 


Alcohol Dehydrogenase 


Alcohol Dehydrogenase 


Alcohol Dehydrogenase 


Alcohol Dehydrogenase 


Alcohol Dehydrogenase 


Alcohol Dehydrogenase 


Alcohol Dehydrogenase 
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Alcohol /glucose 
dehydrog . 


Alcohol /glucose 
dehydrog . 


Alcohol/glucose 
dehydrog . 


Alcohol /glucose 
dehydrog . 


Alcohol /glucose 
dehydrog . 


Alcohol /glucose 
dehydrog , 


Alcohol /glucose 
dehydrog . 


Alcohol/glucose 
dehydrog . 


Alcohol /glucose 
dehydrog . 


Alcohol /glucose 
dehydrog . 


Alcohol/glucose 
dehydrog . 


Alcohol /glucose 
dehydrog . 


Formate/glycerate 
dehydrog , 


NAD(P) binding 
Rossman 


NAD(P) binding 
Rossman 


NAD(P) binding 
Rossman 


NAD(P) binding 
Rossman 


NAD(P) binding 
Rossman 


NAD(P) binding 
Rossman 


NAD(P) binding 
Rossman 


NAD(P) binding 
Rossman 


NAD(P) binding 
Rossman 


NAD{P) binding 
Rossman 


NAD(P) binding 
Rossman 


NAD(P) binding 
Rossman 


NAD (P) binding 
Rossman 


NAD(P) binding 
Rossman 


NAD (P) binding 
Rossman 


NAD(P) binding 
Rossman 


NAD(P) binding 
Rossman 


NAD(P) binding 
Rossman 


NAD(P) binding 
Rossman 


NAD(P) binding 
Rossman 


NAD (P) binding 
Rossman 


NAD(P) binding 
Rossman 


NAD (P) binding 
Rossman 


NAD(P) binding 
Rossman 


NAD(P) binding 
Rossman 


NAD (P) binding 
Rossman 


Ihdz 


Ihld 


Ihtb 


Ikev 


llde 


lldy 


Iteh 


lykf 


2ohx 


-H 
X 

0 

CN 


3bto 


3 hud 


Idxy 


Horse 
Liver 


Horse 
Liver 


human 


Cod 
liver 


Horse 
Liver 


horse 
liver 


human 


Thermoan 
aerobium 


Horse 
Liver 


Horse 
Liver 


Horse 
Liver 


human 


Lactobac 

illus 

Casei 


Alcohol Dehydrogenase 


Alcohol Dehydrogenase 


Alcohol Dehydrogenase 


Alcohol Dehydrogenase 


Alcohol Dehydrogenase 


Alcohol Dehydrogenase 


Alcohol Dehydrogenase 


Alcohol Dehydrogenase 


Alcohol Dehydrogenase 


Alcohol Dehydrogenase 


Alcohol Dehydrogenase 


Alcohol Dehydrogenase 


D- 2 -hydroxy isocaproate 
Dehydrogenase 
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The results shown in Table 11 demonstrate that 
bound conformation of NAD(P) (H) can be correlated with 
protein fold. Grouping oxidoreductases into 
pharmacofamilies based on the bound conformations of 
5 NAD(P) (H) resulted in a correlation with protein fold. 
Pharmacofamilies 1-3 consist of polypeptides having the 
NAD(P) (H) binding Rossman fold. Pharmaco family 4 
consists of polypeptides having heme-linked catalase 
fold. Pharmacof amily 5 consists of polypeptides having 

10 the p-a TIM barrel fold. Pharmacof amily 6 consists of 
"i;^ polypeptides having the dihydrof olate reductase fold. 

D Pharmacof amily 7 consists of polypeptides having the 

± FAD/NAD (P) (H) binding domain fold. Trypanathione 

O reductase was added to family 7 by homology of its active 

?S 15 site to the active sites of other members of 
a pharmaco family 7 independent of bound ligand 

conformation. Pharmacof amily 8 consists of polypeptides 
flJ having the ferrodoxin like fold. Pharmacofamilies 1 and 

.f=5 2 were identified based on anti or syn conformation, 

H 20 respectively, of the nicotinamide ring relative to the 

ribose. Additionally, a change in the torsion angles in 
the bonds connecting the adenine ribose to the adenine 
phosphate separates the family members having a Rossman 
fold into a third pharmacof amily, identified as 
25 pharmacof amily 3. 

The results described in this example 
demonstrate that a bound conformation of a ligand can be 
correlated with polypeptide fold. Furthermore, the 
results obtained by the method are consistent with 
30 results obtained by SCOP. Therefore, classification 
based on bound conformation of ligands can be used to 
classify polypeptides according to structure. 
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EXAMPLE III 

Determination of a conformer model and pharmacophore for 

pharmacoclusters 1-8 

This example demonstrates determination of the 
5 average bound conformations from pharmacoclusters 1-8 and 
construction of conformer models based on the average 
bound conformations. This example also demonstrates 
construction of a pharmacophore model based on the 
average bound conformations and interactions with 
10 polypeptides . 

Conformer models for each pharmacocluster were 
produced by determining an average structure for the 
subset of members of each pharmacocluster as described in 
Example I. The coordinates for conformer models of 
pharmacoclusters 1-8 are shown in Part C of Tables 3-10 
respectively. 

Pharmacophore models were constructed by 
aligning the active sites of a pharmacof amily of 
oxidoreductases . Three-dimensional overlays were 
20 achieved using Insight II overlay module to overlay the 
NAD(P) ligands of each enzyme-ligand complex. 
Heteroatoms in the surrounding protein that could 
function as hydrogen bond acceptors or hydrogen bond 
donors were identified in each complex that made 
25 interactions with the NAD(P) ligand. These heteroatoms 
that had common positions in three dimensional space 
(within 3A of each other in the overlay) in each enzyme 
complex and that made a common interaction with the 
ligand were then grouped together and tabulated for 
30 pharmacophore construction. Water molecules were 
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similarly identified and grouped. The grouped 
heteroatoms and water molecules are listed in Part D of 
Tables 3-10 below. Finally the average coordinates and 
the standard deviation for each interaction group were 
5 calculated. The final pharmacophore model was produced 
by overlaying interaction groups on the conformer model 
(average ligand structure) . 

The coordinates for pharmacophore models of 
pharmacoclusters 1-8 are shown in parts B and C of Tables 

10 3-10, respectively. Specifically, each conformer model 
includes the average NAD(P) coordinates (in part C of 
each Table) and the pharmacophore model includes both the 
average NADP coordinates, average water coordinates and 
the average protein heteroatom coordinates (including 

15 coordinates in both part B and C of each Table) . An 
exception is the pharmacophore model derived from 
pharmacofamily 7 which includes average water coordinates 
and average protein heteroatom coordinates for all 
polypeptides listed but has a conformer model derived 

20 from NAD(P) bound to each polypeptide listed except 
trypanathione reductase. 

A structural representation of each conformer 
model with overlayed interaction groups used to determine 
respective pharmacophore models 1-8 is provided in Figure 

25 3. The structures shown in Figure 3 reflect the average 
NAD(P) coordinates shown in Part C of Tables 3-10 and the 
coordinates for all interacting groups used to calculate 
the average water coordinates and the average protein 
heteroatom coordinates as shown in Part D of Tables 3-10. 

30 Hydrogen bond acceptors are labeled with an ^A' followed 
by a number for each group. These are listed in the 
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pharmacophore Tables and designated on the pharmacophore 
figures. Donors are labeled with a ^D' ; and water 
molecules are labeled with a . 

This example demonstrates construction of 
5 conformer models based on the bound conformations of 
ligands in pharmacoclusters . This example also 
demonstrates construction of a pharmacophore model based 
on the bound conformations of ligands in pharmacoclusters 
and their interactions with polypeptides in their 
10 respective pharmacof amilies . 

Example IV 

Correlation Between the Bound Conformation of Ligands and 
a Conforiaation-Dependent Property 

This example describes a conformation-dependent 
15 property that is correlated with a bound conformation of 
a ligand. 

A 2D [^H,^H] NOESY spectrum was recorded with a 
0.2 ml sample of 1 mM NADP and 200 |iM of enzyme 1-deoxy 
D-xylulose 5-phosphate reductoisomerase (DOXP) . The 
20 spectrum was measured with a Bruker DRX700 spectrometer 
operating at 700 MHZ frequency. The total measuring 
time was about 
12 h. 

The spectrum is shown in Figure 4 and atoms are 
25 identified according to Figure 2. The relative 

intensities of the observed transferred NOEs (trNOEs) 
between the ribose proton H-Cl'N(NCl') and the protons on 
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the nicotinamide ring, H-C4N and H-C2N shown in Figure 4, 
reveal that the NADP adopts a syn conformation when bound 
to the enzyme. 

The bound conformations in Pharmacocluster 1 
5 and 2 can be distinguished according to anti or syn 
conformation, respectively, of the nicotinamide ring 
relative to the ribose. Therefore, these results 
demonstrate that the relative intensities of the observed 
trNOE's between the ribose proton H-Cl'N(NCl') and the 
10 protons on the nicotinamide ring, H-C4N and H-C2N can 
provide a conformation dependent property useful in 
distinguishing members of pharmacoclusters 1 and 2. 

Example V 

15 Binding compounds having specificity for one or more 

polypeptide pharmacof amilies . 

This example demonstrates querying a database 
of compounds to identify individual compounds having 
similar conformations. This example also demonstrates 
20 preferential binding of a compound to a polypeptide of 
one pharmacof amily over another. 

The TTE0001.001.A07 AND TTEOOOl . 002 . D02 
compounds were identified by using the THREEDOM algorithm 
to query a database of commercially available molecules 

25 (ASINEX; Moscow, Russia) by shape matching with cibacron 
blue. Coordinates of cibacron blue were obtained from 
the published 3D structure (Li et al., Proc. Natl. Acad. 
Sci. USA 92:8846-8850 (1995)). The database was created 
by converting an SD format file of structures from ASINEX 

30 to INTERCHEM format coordinates using the batch2to3 
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program. Cibacron blue was compared against each 
structure in the database in multiple orientations to 
generate a matching score. Out of 37,926 structures 
searched, the 750 best matching scores were selected. 
5 From these 750 structures, TTEOOOl . 001 . A07 AND 

TTE0001.002.D02 were selected and purchased based on 
objective criteria such as likely favorable binding 
interactions, pharmacophore properties, synthetic 
accessibility and likely pharmacokinetic, toxicological, 
10 adsorption and metabolic properties. 

Kinetic studies were carried out in 1-cm 
cuvettes in a 1 mL volume at 25''C. Lactate dehydrogenase 
reactions were monitored spectrophotometrically with a 

15 Gary 300 by following the decrease in absorbance at 340 
nm due to the oxidation of NADH by pyruvate. Lactate 
dehydrogenase reaction mixtures contained 100 mM Hepes 
buffer at pH 7.4, as well as 2.5 mM pyruvate, 10 |aM NADH, 
5 ng/mL lactate dehydrogenase. NADPH, NADH, Hepes 

20 buffer, and rabbit muscle lactate dehydrogenase were 
purchased from Sigma. Cytochrome P450 reductase 
reactions were monitored by following the decrease in 
absorbance at 550 nm due to the reduction of ferric 
cytochrome c by NADPH. Cytochrome P4 50 reductase 

25 reaction mixtures contained 100 mM Hepes buffer at pH 
7.4, as well as 80 |jM ferric cytochrome c, 10 |iM NADPH, 
and 80 ng/mL cytochrome P450 reductase. Data were fitted 
using the FORTRAN programs of Cleland, Adv. Enzvmol. 45: 
273-387 (1977) which perform nonlinear least squares fits 

30 to the appropriate equations. Substrates were varied 
around their Michaelis constants, while nonvaried 
substrate was kept at a concentration close to its 
Michaelis constant- The concentration of inhibitor that 
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gives 50% inhibition {IC50) values were obtained by 
fitting data to the equation for a line, where Y values 
are 1/rate and X values are the concentration of 
inhibitor, as in a Dixon plot (Segel, supra) . The X- 
5 intercept is the IC50. If a full kinetic profile was 

done, then K^^ values were obtained by fitting the data to 
the equation for a competitive inhibitor: 



V.axA 



rate = 

10 K^d + I/Kis) + A 



where rate is the rate of reaction in units of 
absorbance/minute, V^^^ is the maximum velocity, is the 
Michaelis constant for A, K,3 is the inhibition 
dissociation constant for the inhibitor, I is the 

15 inhibitor concentration, and A is the concentration of 
NADH or NADPH. In all cases, the fit to the above 
equation was used only after establishing that the fit to 
equations for noncompetitive and uncompetitive inhibition 
were less appropriate based on values for sigma (overall 

20 fit) as well as standard deviations for fitted constants 
(Ki3 and KiJ . 

As shown in Figure 5, compound TTEOOOl . 001 . A07 
could inhibit binding of NADH to lactate dehydrogenase 
25 and NADPH to cytochrome P4 50 reductase which are 
polypeptide members of pharmacof amily 1 and 8 
respectively- Compound TTEOOOl . 001 .A07 demonstrated high 
binding affinity for both lactate dehydrogenase and 
cytochrome P450 reductase • 
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Analysis of inhibition of binding between NADH 
and lactate dehydrogenase is shown in Figure 6. Compound 
TTE0001.002.D02 inhibited lactate dehydrogenase with a 
of 2.1 |iM. Similar measurements of cytochrome P450 
5 reductase with concentrations of compound TTEOOOl . 002 . D02 
up to 0.5 mM did not indicate inhibition. These results 
indicated that compound TTEOOOl . 002 . D02 had a K^^ of 
greater than 1 mM with cytochrome P450 reductase. Thus, 
compound TTEOOOl . 002 . D02 demonstrated preferential 
10 binding for pharmacof amily 1 having an inhibitory 

dissociation constant {K^s) that was at least 500 fold 
lower than for pharmacof amily 8. 

The results described in this example 
demonstrate that a binding compound can be identified by 

15 structural comparison to a bound conformation of a 
ligand. Furthermore, the results demonstrate that 
binding compounds that interact with polypeptides from 
multiple pharmacofamilies or compounds that 
preferentially bind to polypeptides of one pharmacof amily 

20 compared to polypetides of another pharmacof amily can be 
identified by structural comparison to a bound 
conformation of a ligand. 

Example VI 

Identification of a ligand using a pharmacophore model 

25 This example demonstrates construction of a 

pharmacophore model, use of the model to identify a 
binding ligand and confirmation of the ability of the 
identified compound to bind a polypeptide member of the 
pharmacofamily from which the pharmacophore model was 

30 derived. 
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Pharmacophore models were constructed to 
include part or all of the NAD(P) shape, hydrogen bond 
donors, hydrogen bond acceptors and/or other chemical 
features described in Tables 3-10. The combination of 
5 chemical features chosen for each search pharmacophore in 
a search set were chosen in an attempt to cover a 
diverse range of combinations of possible chemical 
interactions and to represent the protein ligand 
interactions that occur most frequently in the particular 
10 pharmacof amily . 

Pharmacophore shape was derived using the 
program CATALYST, and was calculated using the Van der 
Waals surface for part or all of the structure of the 
averaged NAD(P) coordinates determined for a 
15 pharmacocluster , Desired hydrogen bonding features, 

water molecules and other chemical motifs were positioned 
in the pharmacophore model using the average coordinates 
determined for both the pharmacof amily and 
pharmacocluster . 

20 The components of a pharmacophore model derived 

from the coordinates presented in Table 3 for 
pharmacof amily 1 are shown in Figure 7 . Figure 7A shows 
the structure for the conformer model having coordinates 
listed in Table 3C with a superimposed volume defining 

25 the shape of the ligand and indicated by grey spheres. A 
hydrophobic feature was added to the pharmacophore model 
at the average position of the hydrophobic region of the 
nicotinamide ring as shown in Figure 7B. Also shown in 
Figure 7B is a hydrogen bond acceptor positioned at the 

30 average coordinates for the pyrophosphate using the 

averaged coordinates for the location of hydrogen bond 
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acceptors utilized in all of the 17 polypeptides of the 
pharmacofamily. Finally, Figure 7B shows a hydrogen bond 
donor positioned according to a position where a hydrogen 
bond donor of a ligand would be expected to have 
5 favorable interactions with hydrogen bond acceptors 
observed in 11 of the polypeptides of pharmacofamily 1. 
Thus, the hydrogen bond donor does not identify a 
position of an actual hydrogen bond donor in the NAD(P) 
ligand, but instead a location to where a potential 
10 ligand' s hydrogen bond donor could make favorable 

interactions with the polypeptides of pharmacofamily 1. 
Figure 7C shows the combined features of figures 7A and 
7B present in a pharmacophore model used to search a 
database of compounds. 

15 To identify potential ligands that bind to 

polypeptides of pharmacofamily 1, computational searches 
were conducted using CATALYST. Searches were made by 
comparing the shape and combination of chemical features 
of the pharmacophore model, shown in Figure 1 , to the 

20 shape and features of molecules in the database. 

An example of a compound identified using the 
pharmacophore model shown in figure 7C is 
TTE0008.025.D08. Using a binding assay similar to that 
described in Example V, compound TTE0008 . 025. DOS was 
25 shown to have inhibitory activity against pharmacofamily 
1 member, lactate dehydrogenase (IC50 = 50 |aM) . 
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Example VII 

Identification of new members of a pharmacofamily using 
sequence models of phaxmacofamilies 

This example demonstrates the construction of 
5 Hidden Markov Models based on pharmacofamilies. This 

example also demonstrates validation of the Hidden Markov 
Models in identifying, from a large sequence database, 
members of the pharmacofamilies used to produce the 
Hidden Markov Models and new members that were not used 
10 to produce the models. 

Polypeptides in pharmacofamilies 3 and 5, 
respectively, were structurally aligned with PrISM (Yang 
& Honig, J Mol Biol. 301:691-711 (2000)). Hidden Markov 
Models were produced using the aligned polypeptides of 
15 each pharmacofamily as a training set in HMMER 2.1 with 
default options (Sean Eddy, unpublished; Department of 
Genetics, Washington University, St. Louis) . The models 
were calibrated using HMMER. 

20 The Hidden Markov models were used to search 

the PDB for members of the respective pharmacofamilies. 
The PDB was used as a test database to validate the 
models because there is structural and functional 
information about each polypeptide, thereby allowing 

25 accurate confirmation of whether a polypeptide identified 
with the Hidden Markov Models belongs to a 
pharmacofamily . 

The PDB sequence library was searched with 
Hidden Markov Models using the HMMER 2.1 algorithm. 
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Polypeptide sequences identified by searching with the 
Hidden Markov Model were ranked according to an E value 
score produced by the HMMER program. 

The search performed with the Hidden Markov 
5 Model derived from pharmacofamily 5 returned a set of 
polypeptides having E values significantly less than 1 as 
shown in Table 12. Figure 8 shows a plot of "ln(E) vs. L 

for the data of Table 12, where L is the location of 
identified sequences in the list shown in Table 12. Due 
10 to the low E values, all of the polypeptides shown in 
Table 12 were compared to a validation set as described 
below. 
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The search performed with the Hidden Markov 
Model derived from pharmacof amily 3 returned a set of 
polypeptides in which all but one identified polypeptide 
had an E value significantly less than 1 as shown in 
5 Table 13. A significant increase was observed in E value 
between the penultimate identified polypeptide and last 
identified polypeptide in the list ordered according to 
decreasing E value as shown in Table 13. The significant 
drop position is also evident in a plot of -ln(E) vs. L 
10 as shown in Figure 9. Due to the presence of this large 
drop position, all polypeptides except the final 
polypeptide shown in Table 13 were compared to a 
validation set as described below. 
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Comparison to a validation set was carried out 
as follows. The predictive ability of the model was 
confirmed by comparing the polypeptides identified by the 
search of the PDB to a validation set including members 
5 of the respective pharmacof amily . The ratio of false 
positives (RFP) and true positives (RTP) was calculated 
for the set of polypeptides identified from the above 
described searches. A positive is a polypeptide 
identified as corresponding to the Hidden Markov Model 

10 used. An RFP is the ratio of the number of false 
positives returned by the search to the number of 
positives returned by the search, where a false positive 
is a polypeptide identified as corresponding to the 
Hidden Markov Model used that is not a member of the 

15 validation set. An RTP is the ratio of the number of 
true positives returned by the search to the number of 
true positives in the database. Optimal results would 
have a low RFP and a high RTP. 

Comparison of identified polypeptides to the 
20 original training set was used to identify new members of 
pharmacofamily 3. New members can be identified as those 
having (1) a function similar to members of 
pharmacofamily 3, (2) a protein fold similar to members 
of pharmacofamily 3, and/or (3) a bound ligand having a 
25 conformation similar to pharmacocluster 3. Polypeptides 
identified by searching the PDB with pharmacofamily 3 and 
not present in the training set (training set 1) included 
Uridine diphosphogalactose-4-epimerase, dTDP-glucose 4,6 
dehydratase, GDP-manose 4,6 dehydratase, sulfolipid 
30 biosynthesis protein, and alcohol dehydrogenase. 
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Newly identified members of pharmacof amily 3 
were combined with the members of training set 1 to form 
training set 2. A new sequence model was produced from 
training set 2 and the PDB searched as described above. 
5 A plot of -ln(E) vs. L for the results of searching the 
PDB with the sequence model derived from the second 
pharmacof amily 3 training set is shown in Figure 10. 
Comparison of the plots in Figures 9 and 10 shows that 
the second training set, which was improved by adding 
10 more members, had a larger difference in E values at the 
curve inflection occurring just prior to -ln(E)=0, or 
E=l. This statistically significant inflection can be 
used to identify an E value cutoff of E=l. 

15 Table 14 shows RTF and RFP values (expressed as 

percent RFP and percent RTP) obtained for searches of the 
PDB with Hidden Markov Models derived from 
pharmacofamilies 5 and the second training set of 
pharmacof amily 3 and an E value cutoff of 10. 

20 Table 14: Results of PDB search with Hidden Markov Models 



pharmacof amily 


E value 
cutoff 


RFP % 


RTP % 


3 (training set 2) 


1 


0 


100 


3 (training set 2) 


10 


20 


100 


5 


1 


0 


100 


5 


10 


0 


100 



As shown in Table 14 the Hidden Markov Models 
produced from pharmacofamilies 3 and 5 could be used to 
accurately identify the members of the respective 
30 pharmacofamilies in the PDB. Specifically, the Hidden 
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Markov Models could be used to identify all of the 
members of the respective pharmacof ami lies as indicated 
by an RTP of 100% and did not falsely identify non- 
members in the database as indicated by an RFP of 0% with 
5 an E value cutoff of 1. 

Example VIII 

Identification of new members of a pharmacof ami ly by 
differential filtering 

This example demonstrates the construction of 
10 Hidden Markov Models based on different subsets of 
positions in the structurally aligned members of 
pharmacof amily 1. In addition, this example demonstrates 
searching a sequence database by differential filtering 
and validation of differential filtering in identifying 
15 pharmacof amily members in a large sequence database. 

Furthermore, this example demonstrates identification of 
a new member of a pharmacof amily using differential 
filtering. 

Polypeptides in pharmacof amily 1 were 
20 structurally aligned with PrISM and a first Hidden Markov 
Model was produced for the aligned polypeptides using 
HMMER 2.1 as described in Example VII. The training set 
for the first Hidden Markov Model includes all of the 
residues shown in Figure 11. The PDB sequence library 
25 was searched with the first Hidden Markov Model as 
described in Example VII. 

A second Hidden Markov Model was built to 
emphasize the binding site region by setting only those 
residues having at least one atom within 4.5 angstroms of 
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the binding site as match states. Atoms within 4.5 
angstroms of the binding site and used to train the 
second Hidden Markov model are shown in bold in Figure 
11. A SELEX formatted sequence file was generated with 
5 HMMER and edited to designate as matched states only the 
residues having any atom within 4.5 angstroms of the 
cof actor binding site. Positions not marked as match 
states by HMMER in the initial generation of the SELEX 
file, due to insufficient positional population in the 

10 alignment, were not marked as match states even if they 
corresponded to residues close to the cofactor binding 
site. This sequence file was used (with the — hand 
option of HMMER) to create a Hidden Markov Model modeling 
only the sequence motifs. The model was calibrated using 

15 HMMER. 

The search performed with the first Hidden 
Markov Model derived from pharmacof amily 1 returned a set 
of polypeptides having E values in a range including 

20 values less than and greater than 1 as shown in Table 15. 
In contrast to the results presented in Example VII for 
pharmacof amily 3, a large inflection was not observed in 
a plot of -ln(E) versus L as shown in Figure 12. 
Therefore, differential filtering was used to reduce the 

25 ratio of false positives identified while minimizing 
reduction in the ratio of true positives identified. 
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Differential filtering combining searches with 
the first Hidden Markov Model and binding site region 
Hidden Markov Model was performed as follows. 
Polypeptides returned from the above described search 
5 with the first Hidden Markov Model derived from 

pharmacofamily 1 and having E values smaller than 1 were 
combined into a second sequence library. This second 
sequence library was searched by the binding site region 
Hidden Markov Model derived from pharmacofamily 1. The 
10 set of polypeptides returned from this differential 

search is shown in Table 16. A plot of -ln(E) vs. L for 
the sequences of Table 16 is shown in Figure 13. 
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The polypeptides returned from the differential 
search and having various E value ratios were compared to 
a validation set as described in Example VII. The RFP% 
and RTP% obtained for the search based on the full 
5 sequence Hidden Markov Model and based on the 

differential filtering search are shown in Table 17. In 
Table 17 the first and second rows show the results of 
searches of the PDB with the first sequence model with E 
value cutoffs of 1 and 10 respectively. The last two 

10 rows show the results of differential filtering in which 
the sequences identified from a search with the first 
model (in lines 1 and 2) were searched again with a 
second model. Specifically, line 3 shows the results of 
searching the sequences identified from the first model 

15 at E=10 with the second model at E=10 and line 4 shows 
the results of searching the sequences identified from 
the first model at E=l with the second model at E=10. 
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Table 17: Results of PDB search compared to original 

validation set 



Search 


E value 
first HMM 


E value 
binding 
site HMM 


E value 
ratio 


RFP% 


RTP% 


full 

sequence HMM 


1 


NA 


NA 


9 


100 


differential 


1 


10 


1:10 


8 


99 


full 

sequence HMM 


10 


NA 


NA 


48 


100 


differential 


10 


10 


1:1 


39 


99 



As shown in Table 17, differential filtering 
provided a significant improvement in RFP with little or 
no effect on the RTF as compared between respective E 
value cutoffs. The results of Table 17 also show that by 
adjusting the E value ratios, significantly lower RFP can 
be achieved with minor effects on the RTP. 




Polypeptides identified by differential 
filtering and not present in a pharmacof amily 1 
validation set can be identified as new members of 
20 pharmacof amily 1. New members can be identified as those 
having (1) a function similar to members of 
pharmacof amily 1, (2) a protein fold similar to members 
of pharmacof amily 1, and/or (3) a bound ligand having a 
conformation similar to pharmacocluster 1. By this 
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criteria polypeptide D-glycerate dehydrogenase was 
identified as a new member of pharmacof amily 1. 

An improvement in the ability of differential 
filtering to accurately and specifically identify members 
of pharmacofamily 1 can be achieved by adding newly 
identified members to the original validation set to 
create an expanded validation set. Table 18 presents the 
RFP and RTP values obtained when the polypeptides 
produced by differential filtering were compared to the 
expanded validation set containing newly added 
polypeptide D-glycerate dehydrogenase. 

Table 18: Results of PDB search compared to expanded 

validation set 



Search 


E value 
first HMM 


E value 
binding 
site HMM 


E value 
ratio 


RFP% 


RTP% 


full 

sequence HMM 


1 


NA 


NA 


3 


100 


differential 


1 


10 


1:10 


2 


98 


full 

sequence HMM 


10 


NA 


NA 


45 


100 


differential 


10 


10 


1:1 


36 


98 
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Comparison of the results from the original 
validation set shown in Table 17 with the results from 
the expanded validation set shown in Table 18 indicate an 
improvement in RFP with only minor reduction in RTP. 



5 Example IX 

Identification of members of pharmacofamily 1 in the TB 

proteome 



This example demonstrates searching the TB 
proteome with full sequence Hidden Markov Models derived 

10 from various pharmacof amilies . This example demonstrates 
identification of potential functions for sequences in a 
proteome for which a function has not yet been assigned. 
This example also demonstrates determination of which 
pharmacofamily a newly identified sequence most likely 

15 belongs. 



Full sequence Hidden Markov Models were 
produced for pharmacof amilies 1, 2, 3, 5, & 6 as 
described in Example VII. The full sequence Hidden 
Markov Models were used for single sequence searches of 
20 the TB proteome essentially as described in Example VII. 
The TB proteome has been described in Cole et al.. Nature 
393:537-544 (1998) . 



The results of a search with the full sequence 
Hidden Markov Model derived from pharmacofamily 1 is 
25 shown in Table 20. As shown in Table 20 a number of 
''putative" or '"probable" dehydrogenase sequences were 
identified in the proteome having relatively low E 
values. Examples of these dehydrogenases are indicated 
in bold font in Table 20. Thus, indicating that a 
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sequence model derived from a pharmacof amily can be used 
to identify potential new members of a protein family in 
a proteome containing sequences encoding polypeptides of 
unknown function. 
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Comparison of the E values obtained for a 
specific sequence identified from searches with full 
sequence Hidden Markov Models derived from multiple 
pharmacofamilies could be used to determine to which 
5 pharmacofamily an identified sequence most likely 

belonged. In a representative result, a sequence in the 
TB proteome annotated as ^putative dehydrogenase Rv 
1245c' was predicted to belong to dehydrogenase 
pharmacofamily 3 with an E value of 5x10"^® and to 

10 dehydrogenase pharmacofamily 1 with an E value of 55. 
According to searches with full sequence Hidden Markov 
Models derived from pharmacofamilies 2, 5, and 6 there 
was no significant probability (small enough E value) 
that the protein belonged to pharmacofamilies 2, 5, or 6. 

15 Thus, it was concluded that 'putative dehydrogenase Rv 
1245c' is a member of pharmacofamily 3. 

These results indicate that it was possible to 
make a statistically significant prediction about which 
pharmacofamily 'putative dehydrogenase Rv 1245c' belongs 

20 based solely on comparison to sequence models for a 

variety of pharmacofamilies. Thus, even in the absence 
of functional characterization of 'putative dehydrogenase 
Rv 1245c' a ligand geometry can be identified by 
comparison to pharmacocluster 3 according to the methods 

25 described herein. Based on this ligand geometry a 

binding compound can be identified or designed that will 
specifically bind to 'putative dehydrogenase Rv 1245c.' 

This example demonstrates that, once built and 
verified, sequence models derived from various 
30 pharmacofamilies can be used to provide pharmacofamily 
annotation of a proteome. Sequences unable to be 
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adequately annotated by other methods can be identified 
as members of a pharmacof amily in this way. Furthermore, 
once identified^ polypeptides encoded by newly identified 
sequences can be targeted with an appropriate binding 
5 compound identified or designed based on the appropriate 
pharmacocluster . 

Coordinates for the conformer and pharmacophore 
models and data used in their construction is presented 
in Tables 3-10 below. Part A of each Table lists subset 

10 of structures used in constructing the model including 
molecule numbers for cross-referencing between parts A-C, 
the PDB accession number, the name of the polypeptide, 
and the RMSD from the pharmacocluster average. Part B of 
each Table lists the average coordinates for heteroatoms 

15 and waters of the pharmacophore model and includes the 
atom name (cross referenced to part D) , designation of 
interaction (^'ACC," acceptor; ^^DON," donor; and ^^T," 
water) , total number of atoms included in the calculation 
of the average, and X, Y, Z coordinates with respective 

20 standard deviations (a) . Part C of each Table lists the 
coordinates of the conformer model using the atom 
designations of Figure 2 and X, Y, Z coordinates with 
respective standard deviations (a) . Part D of each Table 
lists the coordinates for interacting molecules used to 

25 determine the pharmacophore model including the atom 
name, residue molecule # (which identifies the residue 
type and molecule number cross-referenced to Part A) , 
residue number from the PDB structure, total number of 
atoms summed for the average coordinates, and X, Y, Z 

30 coordinates with respective standard deviations (a) . The 
bolded entries in part D correspond to the average values 



184 

reported in part B. Atom names are identified according 
to lUPAC recommendations as described for example in 
Markley et al., Pure and Aool. Chem. 70:117-142 (1998). 

EXAMPLE X 

Use of Natural Log E-Value Ratios in Determining 
Pharmacof amily Membership Based on Sequence Models 

This example demonstrates identification of 
pharmacofamily members based on relative scores for E 
values of candidate members identified from searching a 
database with a sequence model. The method is 
particularly useful for identifying members of a 
pharmacofamily in cases where differences in E values for 
members and non members is relatively small. 

Polypeptides in pharmacofamily 1 were 
structurally aligned with PrISM and a Hidden Markov Model 
was produced for the aligned polypeptides using HMMER 2.1 
as described in Example VII. The training set for the 
first Hidden Markov Model includes all of the residues 
shown in Figure 11. The PDB sequence library was 
searched with the first Hidden Markov Model as described 
in Example VII. 

The search performed with the Hidden Markov 
Model derived from pharmacofamily 1 returned a set of 
polypeptides having E values in a range including values 
less than and greater than 1 as shown in Table 15. In 
contrast to the results presented in Example VII for 
pharmacofamily 3, a large inflection was not observed in 
a plot of -ln(E) versus L as shown in Figure 12. 
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The following method was used to more clearly 
identify the demarcation between members and nonmembers 
of pharmacofamily 1. A ratio of the -ln(E) for the 
sequence compared against pharmacofamily 1 with the 
summed -ln(E) for pharmacof amilies 1 through 8 was 
calculated. This ratio is here referred to as XCorr (for 
cross correlation) . 

XCorr = '"(^^ 



Jl^ , where N is the total number of 

£ln(£/) 

pharmacof amilies in the analysis. 



10 As shown in Figure 14, where the triangles 

represent the XCorr values (multiplied by 100 for 
purposes of expressing as a percentage) , a significant 
^break point' in XCorr values occurred at the same 
location in the sequence list as that identified by 

15 differential filtering (see Example VIII) . In 

particular, the break point occurred where XCorr dropped 
from the neighborhood of 100% to the neighborhood of 
zero. All sequences above the break point (having higher 
-ln(E) values than those at the break point) are members 

20 of pharmacofamily 1 and all sequences below the break 

point (having -ln(E) values less than those at the break 
point) are not members of pharmacofamily 1. 



In general, each sequence member of 
pharmacofamily 1 had an XCorr value near 100%, indicating 
25 that the probability that the sequence belongs to the 
specified pharmacofamily is much higher than the 
probability that it belongs to a different 
pharmacofamily. Sequences with an XCorr value close to 
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zero for a given pharmacofamily have a greater 
probability of belonging to another pharmacofamily. 

Those sequences that are below the break point 
in Figure 14 but have XCorr values significantly greater 
5 than zero (for example, the 15^*^ and 16^^ from the end and 
having XCorr values close to 100%) are likely members of 
an unrepresented pharmacofamily, outside of the group of 
N pharmacofamilies in question. If however, the set of 
considered pharmacofamilies is known to span the entire 
10 protein family space, then these sequences may be 

'distal' pharmacofamily members with characteristics that 
are under-represented in the pharmacofamily model used. 

The XCorr analysis was automated in a software 
application called Gene Family Profiler as follows. The 

15 protein sequences and Hidden Markov Model files described 
in Example VII were formatted in FASTA and HMMER 2.1 
format, respectively, and read into Gene Family Profiler. 
Minor formatting flaws in the sequence file were 
automatically identified and corrected by the program. 

20 The sequences were searched by the Hidden Markov Models 
using the HMMER 2.1 program and E-values were calculated. 
Sequences having E-values at or below a predefined cutoff 
of 10 were compiled for further analysis (this cutoff E 
value can be altered by the user as necessary) . For 

25 sequences having E-values that were above the cutoff, an 
XCorr value was calculated. 

A summary of E values and XCorr values for each 
sequence was displayed as output from the program. As an 
example, the output indicated that sequence lb61 is most 
30 likely a member of pharmacofamily 1 because it scored an 
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E-value from HMMER above the cutoff for only this 
pharmacofamily Hidden Markov Model and had an XCorr value 
of 1 for pharmacofamily 1. The sequence Inda had E- 
values above the cutoff for both pharmacofamily 1 and 
5 pharmacofamily 7. However, the Inda sequence had Xcorr 
values of 1.0053 for pharmacofamily 7 and -0.0053 for 
pharmacofamily 1, respectively, indicating membership in 
pharmacofamily 7, rather than pharmacofamily 1. 

The Gene Family Profiler software application 
10 was further programed to carry out a secondary search for 
sequences that did not have a probability of belonging to 
any of the 8 pharmacof amilies represented by the Hidden 
Markov Models. If no significant similarities were found 
for a sequence to the pharmacof amilies in the primary 
15 search with the Hidden Markov Model, the sequence was 
analyzed by the PSI-BLAST program (Altschul et al.. 
Nucleic Acids Res. 25:3389-3402 (1997)) against a library 
containing sequences of known members of all 
pharmacof amilies. Thus, the automated methods can be 
20 used to find sequences in the family that are similar to 
a query sequence independent of pharmacofamily 
membership. Results of the secondary search can be used 
to further evaluate the similarity of the query sequence 
to the family as a whole. 
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Throughout this application various 
publications have been referenced. The disclosures of 
these publications in their entireties are hereby 
incorporated by reference in this application in order to 
5 more fully describe the state of the art to which this 
invention pertains. 



Although the invention has been described with 
reference to the disclosed embodiments, those skilled in 
the art will readily appreciate that the specific details 

10 are only illustrative of the invention. It is understood 
that modifications which do not substantially affect the 
activity of the various embodiments of this invention are 
also included within the definition of the invention 
provided herein. Therefore, it should be understood that 

15 various modifications can be made without departing from 
the spirit of the invention- Accordingly, the invention 
is limited only by the following claims. 



